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Lessons from a scandal 


The Karolinska Institute has rightly tightened procedures in response to the controversy 
surrounding surgeon Paolo Macchiarini — but it should not do so to the detriment of its science. 


the 2016 Nobel Prize in Physiology or Medicine. The Karolinska 

Institute (KI) in Stockholm, one of Europe's most highly ranked 
research institutions, will have selected those winners, as it has each year 
since 1901. The KI’s reputation for intellectual quality and integrity has 
been a beacon in the world of biomedicine. But this year, that reputation 
has been rocked by a scandal. 

In 2010, the KI recruited Paolo Macchiarini, a charismatic thoracic 
surgeon who had performed the world’s first tracheal transplant using 
a donated windpipe seeded with the patient’s stem cells. At the KI, 
he wanted to pioneer similar transplants using synthetic windpipes. 
Things didn’t go so well. In the next few years, various allegations of 
clinical and scientific misconduct were brought against him. Yet the 
KI continued to clear him and to extend his employment. 

Outside the sober scientific environment, other sides of 
Macchiarini’s character were coming to light. In January, Vanity Fair 
magazine published a story about a US television news producer who 
said that Macchiarini had promised to marry her in a ceremony over- 
seen by the Pope. Yet the surgeon seemed to be married already. The 
story included claims, since verified, that he had embellished his CV. 
The controversy hit the headlines when Swedish Television aired a 
moving three-part documentary following Macchiarini’s work at the 
Karolinska University Hospital and — when he was stopped from 
doing further transplants there in 2013 — at a university hospital in 
Russia. The images of a young Russian woman who had the opera- 
tion, and subsequently died, burnt into the Swedish psyche. Her life 
had not been in immediate danger, which would have been the only 
justification for such experimental surgery. 

Things finally happened. The KI declined to renew Macchiarini’s con- 
tract, and Swedish police are investigating a possible case of involuntary 
manslaughter and grievous bodily harm. Key figures in the affair, includ- 
ing the KI vice-chancellor and the dean of research, resigned their posts. 
Another resigned from his post as secretary-general of the KI Nobel 
Committee. The KI and its hospital both commissioned reports from 
independent experts, who have now published their results. They paint a 
damning picture, saying that the KI recruited Macchiarini despite nega- 
tive professional references. In the rush to recruit, and hold on to, a bold 
clinician who promised a groundbreaking therapy using fashionable 
techniques, the upper echelons of the KI blinded themselves to warn- 
ing signs, cutting regulatory corners to make sure that nothing would 
block the appointment. The KI seemed similarly blind when it renewed 
Macchiarini’s contracts in 2013 and 2015, and it failed to follow regula- 
tions on handling allegations of scientific misconduct. Both the KI and 
the hospital have accepted the findings in the reports. Macchiarini has 
declined to comment to Nature. 

Some KI scientists put the behaviour of their senior management 
down to increased government pressure to translate research from the 
lab to the clinic as fast as possible. But as noted by Sten Heckscher, a 


Ts Nobel Foundation next month will announce who has won 


former president of Sweden’s Supreme Administrative Court who led 
the investigation into the KI, most institutions don't respond to such 
pressures in this way. 

Public trust in the KI has plummeted, according to the latest national 
opinion poll on Swedish universities, in which it fell from fourth in 2015 
to twelfth this year. Outside Sweden, at least 


“The KI’s wider in scientific circles, its wider reputation might 
reputation mig, ht well be saved by how it has handled the affair 
be saved by how since February. It has adopted a tactic of com- 
it has handled 


plete openness: a timeline of relevant events 
is available in English and Swedish on the KI 
homepage, and is regularly updated. The KI’s 
earlier weak — now discredited — responses to allegations of miscon- 
duct are collected on the same dedicated page (go.nature.com/2cjunzr). 

The KI and its university hospital have learnt from the affair and 
have already fine-tuned many of their procedures, including those 
for recruitment and handling whistle-blowers. Still, the KI should 
not tighten its procedures so much that it no longer feels comfortable 
taking justifiable scientific risks. The institute has gained its standing 
in large part through its willingness to be adventurous in research. 
Observing its exemplary approach to the scandal, the world of bio- 
medicine might yet forgive the KI this one major slip. It will not forgive 
a slip into mediocrity. = 


the affair.” 


Time machine 


Science fiction fights the past as much as it 
faces the future. 


only a plastic model kit of the Apollo Lunar Module. But the 

price was stuck in the past. The UK kit cost 5 shillings and 
11 pence, in a pre-decimal system that dated back to the Middle Ages, 
with abbreviations that recalled the Roman occupation of Britain — 
the penny was abbreviated to ‘d; standing for ‘denarius. 

Such archaisms angered and frustrated Herbert George Wells 
(1866-1946), whose raillery against such relics is documented in 
Simon James's retrospective on page 162 as part of this week’s science- 
fiction special issue. It is followed, on page 165, by Sidney Perkowitz’s 
appreciation of Star Trek, the space-opera TV and movie franchise 
that has been visiting strange new worlds since 1966. 

Britain changed to decimal coinage in 1971, but even countries that 
are long used to money in multiples of ten cart escape the history of 
their currency. The word ‘dollar, for example, derives from ‘thaler; a 


B ack in 1969, you could buy a stake in the future, even if it was 
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coin that can be traced back to early sixteenth-century Bohemia; and 
the abiding fondness of one of the planet's most technically advanced 
nations for non-SI units is a source of some embarrassment or hilar- 
ity (depending on your point of view). The past is a loam of inertia 
through which the shoots of futurity struggle to emerge. As cyberpunk 
author William Gibson once said, the future is already here — it’s just 
not very evenly distributed. 

Wells had every reason to fight for the future. Fate saw him born 
into the smokes and stinks of Victorian Britain as the son of servants 
in 1866. Like all years, it was a potpourri of past and future: it was the 
year of the long-forgotten Austro-Prussian War between two ageing 
empires that have long since crumbled, but also the year that the Royal 
Aeronautical Society was founded, and that Alfred Nobel invented 
dynamite. 

Wells nimbly avoided his fate of becoming a haberdasher and ended 
as one of the visionaries of his age, regularly published in these pages. 
Pulling himself upwards to the light became a personal as well as a 
professional preoccupation. 

Living as we do in a much gentler age (for all that it occasionally 
seems otherwise), we are inclined to dissect Wells’s achievement into 
discrete anticipations of such technological gewgaws as tanks and 
atomic bombs, without appreciating his drive and ambition to better 
not just himself, but the rest of humanity. We are likewise inclined to 
forget that his first full-length novel, The Time Machine, is not just a 
fantasy of the far future but an excoriating damnation of the class sys- 
tem, in which the classes evolve into two separate but interdependent 
species: the leisured, effete and mindless Eloi, preyed on by the ugly 
and industrious Morlocks. This is no hidden allegory: as a character 
says in The Soul Of A Bishop (1917), one of Wells's non-science-fic- 
tional novels, “we are the Morlocks, coming up!”. 


One could be flippant and say that the importance of Wells’s work 
now lies in its intriguing mix of old and new — Wells was steam- 
punk when steam was still punk, his futuristic machines tricked out 
in hand-tooled leather and knurled brass. But Wells earns his place, in 
the words of Brian Aldiss (in Trillion Year Spree), as the ‘Shakespeare 
of sci-fi because he takes ordinary people and tests their reactions to 
technology and its consequences — shaven monkeys from Woking 

pitted against the intellects of Martians, vast and 


“Thefutureis — cooland unsympathetic. 

already here — Star Trek first aired in the centenary year of 
it’s just not the US Civil Rights Act of 1866 — an appropri- 
very evenly ate date, seeing as the show’s prime aim was to 


depict a harmoniously integrated future soci- 
ety rather than anticipate technological mar- 
vels such as the tricorder and the cloaking device. Arthur C. Clarke, 
another titan of sci-fi, dismissed (in The Songs of Distant Earth) one 
such technological trinket, the warp drive, as simply a McGuffin that 
allowed the crew to get from one locale to the next “in time for next 
week's exciting episode”. Star Trek creator Gene Roddenberry, like 
Wells, drew his passion froma need to rise above the inequities of the 
present and forge a more equitable future. 

Why are we celebrating Wells and Star Trek now, in this sci-fi special 
(which includes on page 259 our long-running Futures sci-fi series pre- 
sented as a graphic novel for the first time)? It happens to be 150 years 
since Wells birth, 70 years since his death and 50 years since Star Trek 
was first aired. All satisfying multiples of ten, but measured in units 
based on the revolution of a small planet round an unremarkable star in 
the suburbs ofan ordinary galaxy. As Wells lamented, we are shackled to 
our past. It might be a while before we run such commemorations based 
on binary representations of elapsed numbers of Planck time units. m 


distributed.” 


Where are 
the data? 


A the research community embraces data sharing, academic 
journals can do their bit to help. Starting this month, all 
research papers accepted for publication in Nature and an initial 
12 other Nature titles will be required to include information on 
whether and how others can access the underlying data. 

These statements will report the availability of the ‘minimal 
data set’ necessary to interpret, replicate and build on the findings 
reported in the paper. Where applicable, they will include details 
about publicly archived data sets that have been analysed or gen- 
erated during the study. Where restrictions on access are in place 
— for example, in the case of privacy limitations or third-party 
control — authors will be expected to make this clear. 

The new policy (full details of which are available at go.nature. 
com/2bf4vqn) builds on our long-standing support for data avail- 
ability as a condition of publication. It also extends our support for 
data citation, the practice of citing data sets in reference lists in a 
similar way to citing papers. Authors are encouraged to cite data 
sets that have digital object identifiers (DOIs) assigned to them. 

The introduction of data-availability statements follows a trial 
at five Nature journals — Nature Cell Biology, Nature Communica- 
tions, Nature Geoscience, Nature Neuroscience and Nature Physics 
— that began in March 2016. The pilot confirmed differences in 
the culture of data sharing and access between different disciplines, 
and that the lack of obvious, public, community repositories can 
pose a significant barrier to public data deposition. Nevertheless, 
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even in disciplines that are not yet so able to embrace openness 
and sharing, there is increasing awareness and appreciation that 
data deposition can enhance the visibility and reuse of published 
research, and that data citation can increase the recognition of 
those who create and share data. 

This new policy will be implemented across the diverse range of 
Nature journals by early 2017. We expect that its implementation 
will shed more light on the reasons for disciplinary differences in 
data sharing, identify challenges and help to promote the practice 
more widely. 

It’s not just journals. A broad drive across the research, funding 
and publishing communities is under way to make the availability 
of research data more transparent. Funders, for example, are also 
introducing data-availability statements. The seven UK research 
councils require their grant holders to include them. And the 
US National Institutes of Health is asking researchers to provide 
management plans for their research data. 

We expect that offering consistent information on data avail- 
ability in our papers will promote data reuse by future researchers. 
And where public data archiving is a mandatory requirement of 
journals, there is some evidence that including data-availability 
statements with persistent links to data in published articles is an 
effective approach to ensuring public data availability and policy 
compliance (T. H. Vines et al. FASEB J. 27, 1304-1308; 2013). 

This new policy follows the launch, in July 2016, by our pub- 
lisher Springer Nature of an ambitious project to introduce 
and standardize research data policies across all of its journals 
(see go.nature.com/2by6l6x). The project sets out a defined com- 
mon framework for data policy — which Nature policies align 
with — that enables different journals to encourage data sharing 
in a way that reflects the circumstances of respective specialist 
communities. = 
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hen does history begin? Can we anticipate which of our 
W cerssmroray events the historians of the future will 

find most interesting? A century from now, will there be 
universal acceptance of genetically modified (GM) crops, with little 
sign of the protest and controversy that has surrounded them until 
now? Or will those objections have killed off the development of what 
was once seen as a promising new technology? 

Either way, events of the past two decades will be of great interest. 
Future historians could view this period either as signalling the birth 
of opposition to GM crops or as offering a case study of how and why 
that opposition was once significant — and how it was overcome. 

Hoping to help those future historians, I and others have gathered 
a historical archive of material relevant to the debate over GM crops 
and the food derived from them. 

It became clear more than ten years ago, quite 
early in the debate, that an interesting phenom- 
enon was unfolding. A new set of scientific tech- 
nologies had provoked widespread reactions, 
many of them antipathetic for a wide variety of 
reasons (including health risks), which themselves 
became topics for fierce argument and discussion. 

The science underpinning the deployment of 
the technology and the safety of GM products 
was attested by most of the scientific community 
and essentially all of the official agencies inter- 
nationally responsible for food and environmental 
safety. Opposition, it seemed to most scientists, 
was clearly not based primarily on the validity 
of scientific findings, although many opponents 
claimed that it was. Those counter-arguments 
were rejected by most scientists, who perceived 
them as motivated by political, commercial and other interests for which 
scientific validity was, at best, of secondary importance. 

This was not the first vigorous public reaction to new technolo- 
gies. Innovation is often accepted with alacrity — think of the Sony 
Walkman and the iPhone — but sometimes causes trouble. Riots in 
nineteenth-century London against compulsory smallpox vaccination 
of children (many parents then, as now, felt they should have the choice) 
were followed by objections in Oklahoma to the electric telegraph con- 
nection with New Orleans, which would bring bad news and encourage 
gambling. There were (and remain) objections to milk pasteurization 
and to mobile-phone transmitters, not to mention nuclear power. 

The effort to prepare an archive of the GM debate began in 2008, 
when it became clear that the GM crop and food phenomenon would 
bea useful way to study societal reactions to new technologies. 

Whatever the eventual outcome of the debate, we realized that there 
would be many lessons to learn about how (and how not) to introduce 
a new technology, as well as whether (or not) it might be wise to do 
so. Genetic modification would be an important subject for future, as 


ANEW SET OF 


SCIENTIFIC 


TECHNOLOGIES HAD 


PROVOKED 


WIDESPREAD 


REACTIONS, 


MANY OF THEM 
ANTIPATHETIC. 


The debate over GM crops 
is making history 


Anarchive of material from all sides of the UK genetic-modification controversy 
is up and running and welcomes contributions, says Vivian Moses. 


well as contemporary, study — but much would be lost if records and 
ephemera of all sorts were not retained under safe conditions. 

We cannot know in advance what aspects of GM crops will be of 
interest to future scholars, so it is best to keep as much material as 
possible. Although archives are usually established in retrospect, as 
and when historical subjects attract interest, we set out to do so in 
prospect, knowing from the outset that we have an interesting and per- 
tinent phenomenon to record. It would be presumptuous to estimate 
the archive’s future value, but we did predict that, without it, a time 
would come when its absence would be regretted. 

With collaboration from the British Library, we began a project with 
the Science Museum in London to find and preserve eligible papers, 
films, tapes, disks, websites, equipment and more. (We have no facili- 
ties for storing biological material.) 

Much of the vulnerable material held by indi- 
viduals needed to be secured before it was thrown 
away. By 2008 it was already late: filing cabinets 
are periodically cleaned out. Nevertheless, much 
interesting material was still held by scientists 
and other academics, industry, farming interests, 
government, campaigners, the media and others. 

We planned a global archive, but talking to col- 
leagues in the United States and elsewhere quickly 
showed that this was overambitious. Moreover, 
the Science Museum's remit is to collect material 
mainly from UK sources. So the archive focuses on 
the debate in Britain, which has been particularly 
strong and for which a large amount of material is 
available. The archive contains important records, 
including correspondence, from researchers, 
campaigners and the public-relations firms used 
by the biotech companies to try to counter opposition. 

Space and facilities had to be organized before the archive became 
public, but it is now finally open for use, housed at the Science Museums 
Wroughton site, near Swindon (see go.nature.com/2btqdk1). It includes 
dozens of box files across 23 metres of shelf space and includes cor- 
respondence on the controversial publication of research that claimed 
to show health impacts of GM potatoes. Pending funding to prepare 
a full catalogue, a broad listing of contents is available at go.nature. 
com/2cjptjq. (Click to search Science Museum, London; enter ‘genetic 
in the search box; select “Title’ in ‘Sort By’ and finally click on ‘Search.) 

We continue to seek relevant material, and hope that UK colleagues 
will contribute more to the Wroughton collection and that others 
around the world will be inspired to establish GM archives in their 
own countries. We live in interesting times. Let's preserve them. m 


Vivian Moses is visiting professor of biotechnology at King’s College 
London. 
e-mail: v.moses@qmul.ac.uk 
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Selections from the 
scientific literature 


RESEARCH HIGHLIGHTS 


ASTRONOMY 


Carbon monoxide 
in large-star disks 


Stars twice as massive as 

the Sun can feature carbon- 
monoxide-rich gas disks 
around them, contrary to the 
expectation that ultraviolet 
radiation would have stripped 
away the gas. 

Meredith Hughes at 
Wesleyan University in 
Middletown, Connecticut, 
and her colleagues used the 
Atacama Large Millimeter/ 
submillimeter Array in 
northern Chile to probe the 
regions around 24 young star 
systems, only about 5 million 
to 10 million years old. They 
chose stars surrounded 
bya disk of dust debris — 
resembling a scaled-up 
version of the Solar System's 
Kuiper belt. This leftover 
material could form new 
planets, including gas giants. 
Surprisingly, three of the larger 
stars in the sample had strong 
carbon monoxide emissions. 
Astrophys. J. 828, 25 (2016) 


| CANCER 
‘Perfect storm’ of 
cancer risk 


The ability of an organ’s stem 
cells to generate new tissue over 
time — the cells’ generative 
capacity — determines how 
prone that organ is to cancer. 
Scientists have debated the 
relative importance of factors 
that contribute to an organ’s 
cancer risk, including ‘intrinsic 
factors such as the number 
of stem-cell divisions and 
‘extrinsic’ factors that cause 
tissue and DNA damage. To 
compare these factors, Richard 
Gilbertson at the CRUK 
Cambridge Institute, UK, 
Arzu Onar-Thomas at St Jude 
Children’s Research Hospital 
in Memphis, Tennessee, and 
their colleagues studied stem 


DISEASE ECOLOGY 


Rapid evolution of cancer resistance 


Tasmanian devils have developed a degree of 
genetic resistance to a virulent contagious facial 
cancer in just four to six generations. 

Andrew Storfer at Washington State 
University in Pullman and his colleagues 
sequenced about one-sixth of the genome 
for 294 devils (Sarcophilus harrisii) from 
3 wild populations. The authors used samples 
collected both before and after the groups first 


cells called Prom1* cells with 
varying levels of generative 
capacity in different organs 
in mice of various ages. The 
authors introduced key cancer- 
causing mutations into the 
cells, then looked for tumour 
growth in the organs. 

The team found that cancer 
risk correlated closely with 
the generative capacity of the 
Prom1* cells. In liver tissue, 
cancer mutations alone did not 
cause cancer — tissue injury 
significantly increased cancer 
susceptibility. The authors 
propose that several factors 
contribute to a ‘perfect storm’ 
of tumour growth: mutated 
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stem cells and extrinsic factors 
that trigger cell proliferation. 
Cell http://doi.org/bp73 (2016) 


Trees flourish on 
the happy edge 


As the climate warms, sugar 
maples expanding their 
populations uphill could 
outrun their insect predators 
and flourish on the ‘happy 
edge’ of their range. 
Morgane Urli and her 
colleagues at the University 
of Sherbrooke in Quebec, 
Canada, transplanted 


encountered the facial cancer. 

The team found five genes spread across 
two regions of the genome that showed strong 
signs of selection, including a large number 
of single- DNA-base changes, throughout the 
devil populations. Two of the genes, CD146 and 
THY1, are known to help the immune system to 
recognize foreign cells in other animals. 
Nature Commun. 7, 12684 (2016) 


two-year-old sugar maples 
(Acer saccharum) uphill to 
sites just at, and beyond, their 
current elevation range limit. 
Some were given protection 
from herbivores. Of seedlings 
without protection, more than 
75% at the range edge and 
beyond survived, compared 
with just 30% at the centre 
of the current range. The 
difference narrowed markedly 
in protected plants, suggesting 
that the increased survival was 
largely due to ‘enemy release’ at 
and beyond the current range. 
Previously, the team showed 
that seed predation beyond 
elevation range limits is very 
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high. However, those few 
seeds that do escape can look 
forward to a healthy future. 
Ecology http://doi.org/bp5t 
(2016) 


Lazy bustards 
live longer 


Migration in great bustards 
seems to be on the decline 
because many of those that do 
migrate die in collisions with 
power lines. 

Carlos Palacin at the 
National Museum of Natural 
Sciences in Madrid and his 
colleagues captured and radio- 
tagged 180 male great bustards 
(Otis tarda) across 29 breeding 
groups, covering most of the 
species’ range in Iberia. Only 
some birds migrated north 
in summer. Of those that did, 
21.3% died in crashes with 
power lines, whereas just 6.3% 
in the sedentary group died in 
this way. 

The authors found a steady 
increase in the proportion of 
non-migratory males over the 
study period, from 17% in 1997 
to 45% in 2012. They propose 
that males decide whether to 
migrate by observing other 
males. Thus, as the number 
of migrators declines, the 
behaviour may die out. 

Conserv. Biol. http://doi.org/bp53 
(2016) 


Tiny pterosaurs’ 
tenure extended 


The discovery of a surprisingly 
small fossilized pterosaur 
(pictured with domestic cat for 
scale) in rock some 77 million 
years old challenges the 
accepted history of the winged 
reptiles. Scientists had thought 
that, by around 100 million 
years ago, small pterosaurs had 
been replaced by larger species. 
Elizabeth Martin- 
Silverstone at the University of 
Southampton, UK, and her 
colleagues uncovered a wing 
bone and vertebrae from a 
pterosaur in 80-million- to 
72-million-year-old rock 
formations in British 


Columbia, Canada. Although 
the creature’s 1.5-metre 
wingspan was tiny compared 
with that of the 10-metre 
giants known from this period, 
bone analysis revealed that it 
was almost fully grown. 
Fossilized juveniles of larger 
pterosaur species from this 
period are also rare, suggesting 
that the record may be biased 
against small pterosaurs. 
R. Soc. Open Sci. 3, 160333 (2016) 


Bone cells on 
demand 


Researchers have come up 
with a simple recipe for 
making bone from stem cells. 
Embryonic stem cells can 
form every type of tissue in the 
body, but methods for forcing 
these and other pluripotent 
stem cells to differentiate into a 
specific type can be inefficient 
and costly. A team led by Shyni 
Varghese at the University of 
California, San Diego, added a 
chemical called adenosine — 
which occurs naturally in the 
body — to human stem-cell 
cultures and produced bone- 
making cells called osteoblasts 
in under three weeks. The 
cultured osteoblasts generated. 
calcified bone, and scaffolds 
that had been coated with the 
osteoblasts and implanted into 
mice repaired skull defects. 
Sci. Adv. 2,e1600691 (2016) 


GENETICS 


Synthetic DNA 
overreacts to light 


Synthetic DNA bases created 
in 2014 to expand the genetic 
code are light-sensitive and 
produce reactive oxygen 
species (ROS) when exposed 
to certain wavelengths. 
Ultraviolet light can 
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damage natural DNA bases, 
but cells have in-built repair 
mechanisms to fix this. Carlos 
Crespo-Hernandez of Case 
Western Reserve University 
in Cleveland, Ohio, and his 
co-workers found that two lab- 
made DNA bases — d5SICS 
and dNaM, which have been 
used to design semi-synthetic 
bacteria — generate up to 
100 times more reactive species 
than the most reactive natural 
base, thymidine, when exposed 
to near-visible wavelengths 
of light. In response to light 
exposure, a carcinoma cell 
line grown with d5SICS had 
higher levels of ROS, and cell 
proliferation was reduced. 
Synthetic DNA bases may 
accelerate photochemical 
damage to cells, the authors say. 
J. Am. Chem. Soc. http://doi.org/ 
bp55 (2016) 


Melting ice opens 
Arctic to shipping 


Thanks to melting Arctic sea 
ice, ships with moderate ice 
strengthening (lighter than 
currently required, pictured) 
may be able to travel northern 
waters all year round by the 
century’s end. 

Nathanael Melia and his 
colleagues at the University 
of Reading, UK, used several 
global climate models to 
simulate the fastest shipping 
routes through the Arctic, 
depending on future 
greenhouse-gas emissions. In 
their most extreme scenario, 
the route from Yokohama in 
Japan to Rotterdam in the 

Netherlands becomes 
» 13 days shorter than 


alternative routes by 2100. 

Even ordinary vessels could 
see the period during which 
they can navigate Arctic waters 
double by mid-century. 
Geophys. Res. Lett. http://doi. 
org/bp5x (2016) 


Star-rich early 
galaxy clusters 


Galaxy clusters in the early 
Universe produced more 
stars than their more modern 
counterparts. 

Whena galaxy becomes 
part of a cluster — a group 
of galaxies bound together 
by gravity — its crowded 
surroundings often cause it 
to stop producing stars, an 
effect called environmental 
quenching. Using the Keck 
Observatory in Hawaii and the 
Very Large Telescope in Chile, 
a team led by Julie Nantais at 
the Andres Bello University in 
Santiago observed four galaxy 
clusters nearly 10 billion years 
old. They found that, in these 
early clusters, only about 
30% more of the galaxies had 
stopped producing stars than 
had the surrounding galaxies, 
compared with a difference of 
about 50% in newer clusters. 

Knowing how quenching 
changes over the history of the 
Universe may help scientists 
to determine why the cluster 
environment causes the 
phenomenon. 
Astron. Astrophys. 592,A161 
(2016) 
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UK chief of chiefs 


The UK government opened 
applications on 30 August for 
the post of chief executive of 
UK Research and Innovation 
(UKRI), the body that will 
unite the country’s nine 
existing research-funding 
bodies. The job comes with 

a salary package of around 
£300,000 (US$400,000). The 
salary — about twice that 

of existing research-council 
chiefs — should be enough 

to lure university heads, say 
observers. The head of UKRI, 
which is yet to be created by 
parliamentary legislation, will 
oversee an annual budget of 
more than £6 billion. 


Pe PEQPLE 
Macchiarini inquiry 


Paolo Macchiarini, a disgraced 
surgeon who was formerly 

a visiting professor at the 
Karolinska Institute (KI) in 
Stockholm, worked in an 
environment that fostered 

a “culture of silence” and 

a “nonchalant attitude 
towards regulations”. Those 
are the conclusions of two 
independent inquiries 
initiated by the KI and its 
affiliated university hospital, 
where Macchiarini carried 
out three artificial-trachea 
operations, including the 


NUMBER CRUNCH _ 


30% 


The decline in Africa’s 
savannah elephant 
population from 2007 

to 2014, equal to about 
144,000 elephants. The 
current rate of decline is 
8% per year, with poaching 
mainly to blame. 

Source: Great Elephant Census 


‘Ring of fire’ eclipse glows in African sky 


A spectacular ‘ring of fire’ annular solar 

eclipse passed over southern parts of Africa 

and the Indian Ocean on 1 September. 
Astrophotographers captured the glowing ring 
of the Sun (pictured, as seen from Réunion) for 
the roughly three minutes of the eclipse. Annular 
eclipses happen when the Moon and Earth are 


world’s first, between 2011 and 
2013. Two of the patients died. 
Allegations of misconduct 
against Macchiarini emerged 
in 2014, but the KI cleared 
him. A Swedish documentary 
about his work, aired in 
January, reopened the issue. 
The KI dismissed him in 
March. Macchiarini said 

this week in a statement to 
Swedish broadcaster SVT that 
he is not guilty of research 
mismanagement, and has 
always done the best for his 
patients. Two members of the 
KI’s Nobel Assembly, which 
awards the medicine prize, 
have been asked to resign as 

a result of their links to the 
affair. See page 137 for more. 


Roger Tsien dies 
Nobel-prizewinning 

chemist Roger Tsien, who 
used a jellyfish protein to 
illuminate molecular biology, 
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died on 24 August, aged 

64. Tsien’s work with green 
fluorescent protein turned 

it into a laboratory staple as 
researchers around the world 
used it to label molecules to 
track their expression and 
movements in living cells. 
Tsien worked at the University 
of California, San Diego, and 
shared the chemistry Nobel 
with two other researchers 

in 2008. See go.nature. 
com/2clfomy for more. 


Gene-drive vote 


Members of the International 
Union for Conservation of 
Nature (IUCN) have voted 

for a moratorium on research 
into the use of gene drives for 
conservation. Gene drives 
allow genetic modifications to 
be rapidly spread through wild 
populations, and some people 


slightly farther away from one another in their 
orbits than they are during a total eclipse, so the 
Moon appears smaller when it passes across 
the face of the Sun. The resulting ring is known 
as an annulus. The next annular eclipse will be 
fleetingly visible, for 44 seconds, over the south 
Atlantic Ocean on 26 February 2017. 


have proposed that they could 
be used to wipe out invasive 
species and restore natural 
ecosystems. But the approach 
raises concerns about possible 
unintended consequences of 
releasing gene drives into the 
environment. 


Animal research 


Ina policy turnaround, 

the Alliance of Science 
Organisations in Germany has 
launched an Internet platform 
for education and discussion 
on research with animals 
(tierversuche-verstehen.de). 
Germany has been criticized 
for bucking the European 
trend towards openness on 
animal research. The website, 
launched on 6 September, 
allows interactive discussion 
and provides a service to 
journalists seeking scientific 
expertise. It includes extensive 
information — also through 
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SOURCE: E. J. COTTIER-COOK ET AL. SAFEGUARDING THE FUTURE OF THE GLOBAL 
SEAWEED AQUACULTURE INDUSTRY (UNITED NATIONS UNIVERSITY, 2016) 


a YouTube channel — on the 
legal environment for such 
research, as well as personal 
stories from scientists who 
work with animals, including 
non-human primates. 


| RESEARCH 
Free Neanderthal 


A newly sequenced 
Neanderthal genome is 
available to download for free, 
before formal publication, 
researchers have announced. 
A team led by Svante Paabo at 
the Max Planck Institute for 
Evolutionary Anthropology 
in Leipzig, Germany, has 
released the genome sequence 
of a roughly 45,000-year- 

old Neanderthal bone from 
Vindija Cave in Croatia 
before publishing its own 
analysis of the data (see 
go.nature.com/2bv1shu). The 
researchers say that they want 
to allow other teams to start 
their own analyses. 


Lost Philae found 


Photos taken by the 
mothership of Philae, the 
European Space Agency's 
comet lander, have revealed 
the craft’s location definitively 
for the first time. Philae landed 
on comet 67P/Churyumov- 
Gerasimenko in November 
2014, but it failed to grip the 
comet's surface and bounced 
into a shady spot. The lander’s 
tilted position meant that its 
antenna was partially blocked, 


TREND WATCH 


Seaweed farming has undergone 


astonishing growth over the 


past 50 years, says a report from 


the United Nations University 


released on 4 September. In 2014, 
the industry produced more than 
25 million tonnes of seaweed, 
mostly for food, with a value of 


US$6.4 million. Widening uses for 


the crop, including in fertilizers 


and drugs, are driving growth. But 


seaweed can spread diseases, and 
few regulations exist to safeguard 
against this, the report says. The 
industry should learn from other 
agricultural sectors, it adds. 


making communication 
difficult, and it was unable 

to use its solar panels to 
charge its batteries. The 
pictures, taken by the Rosetta 
spacecraft’s OSIRIS camera 
on 2 September, confirm that 
Philae is lying on its side in the 
shadow ofa cliff, lodged in a 
crack, with one of its legs in the 
air (pictured). See go.nature. 
com/2c90rwu for more. 


EVENTS 


Rocket failure 


A Falcon 9 rocket made by 
commercial-spaceflight 
company SpaceX exploded 
on the launch pad at Cape 
Canaveral in Florida on 

1 September. The event 
occurred two days before the 
craft was set to carry an Israeli 
communications satellite 
into orbit. The cause of the 
explosion, which happened 


BEWARE THE SEAWEED BOOM 


in the minutes leading up 

to a planned engine test, is 
under investigation. The 
rocket’s payload, the AMOS-6 
communications satellite, was 
also destroyed. It had been 
intended to provide Internet 
connectivity across 
sub-Saharan Africa. 


Brexit warning 


Japan’s government has issued 
a 15-page warning over the 
United Kingdom's pending 
exit from the European 
Union. A memo posted online 
by Japan's foreign ministry 
lists requests from Japanese 
businesses operating in 
Britain, including continued 
access to EU research funding 
and the ability to take part 

in EU research projects. The 
document also warns that 

if the European Medicines 
Agency moves from its present 
location in London, money 
and researchers could be 
shifted to continental Europe. 


Academics sacked 


Under state-of-emergency 
provisions, Turkey's 
government issued a decree 
on 1 September that sacked 
2,346 university staff for 
alleged ties to an attempted 
military coup in July. The 
move is part of a wider purge 
of 40,000 civil servants who 
will be excluded from holding 
any government positions 

in the future. Academic 
organizations in Turkey have 


Seaweed farming has grown exponentially over the past few 
decades — but experts warn that regulation is needed to 


ensure sustainable expansion. 
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protested that some of those 
fired are not part of the Giilen 
religious movement, which the 
government says was behind 
the coup, but opponents of 
certain government policies. 
More than 40 had been 
signatories of the ‘academics 
for peace petition released in 
January that called for an end to 
violence between government 
forces and Kurdish separatists, 
and which led to the immediate 
arrests of some signatories. 


Gorilla on the brink 
The latest update of the 
IUCN Red List of Threatened 
Species, which tracks the 
health of animal and plant 
populations, has moved 

the eastern gorilla (Gorilla 
beringei) from the endangered 
to the critically endangered 
category, after a 70% crash 

in the primate’s numbers in 
two decades. However, the 
giant panda (Ailuropoda 
melanoleuca) shifted from 
endangered to vulnerable as a 
result of increased protection 
helping its numbers. Two 
plants endemic to Hawaii, 
Cyanea marksii and 
Wikstroemia villosa, were 
listed as critically endangered, 
having previously been 
thought to be extinct. 


Boon for Paris deal 


In a milestone step, the world’s 
two biggest greenhouse-gas 
emitters, the United States 
and China, have ratified the 
Paris climate agreement. Last 
December, nearly 200 nations 
made a deal to cut emissions 
and keep global temperature 
increases to “well below” 

2°C. For the agreement to 
enter into force, 55 nations 
accounting for at least 55% of 
global emissions need to ratify 
the deal. Together, China and 
the United States generate 
some 38% of global carbon 
emissions. Before 3 September, 
only 24 signatories had ratified 
the deal, representing around 
1% of global emissions. The 
move is expected to prompt a 
surge in ratifications. 
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View from the Mars rover Curiosity at the foot of Aeolis Mons, before the rover starts to climb the mountain. 


PLANETARY SCIENCE 


Mars contamination fear 
could divert Curiosity rover 


NASA must keep Earth microbes from getting into hillside streaks suspected to hold water. 


BY ALEXANDRA WITZE 


NASA’s Curiosity rover faces an 
unexpected challenge: wending its way 
safely among dozens of dark streaks that could 
indicate water seeping from the red planet's 
hillsides. 
Although scientists might love to investigate 
the streaks at close range, strict international 


Mn years into its travels across Mars, 


rules prohibit Curiosity from touching any part 
of Mars that could host liquid water, to prevent 
contamination. But as the rover begins climb- 
ing the mountain Aeolis Mons next month, it 
will probably pass within a few kilometres of 
a dark streak that grew and shifted between 
February and July 2012 in ways suggestive of 
flowing water. 

NASA officials are trying to determine 
whether Earth microbes aboard Curiosity 
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could contaminate the potential Martian seeps 
from a distance. If the risk is too high, NASA 
could shift the rover’s course — but that would 
present a daunting geographical challenge. 
There is only one obvious path to the ancient 
geological formations that Curiosity scientists 
have been yearning to sample for years (see ‘All 
wet?’). 

“We're very excited to get up to these layers 
and find the 3-billion-year-old water,” says 
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Ashwin Vasavada, Curiosity’s pro- 
ject scientist at NASA's Jet Propul- 
sion Laboratory (JPL) in Pasadena, 
California. “Not the ten-day-old 
water.” 

The streaks — dubbed recurring 
slope lineae (RSLs) because they 
appear, fade away and reappear 
seasonally on steep slopes — were 
first reported’ on Mars five years 
ago in a handful of places. The 
total count is now up to 452 pos- 
sible RSLs. More than half of those 
are in the enormous equatorial 
canyon of Valles Marineris, but 
they also appear at other latitudes 
and longitudes. “We're just finding 
them all over the place,” says David 
Stillman, a planetary scientist at 
the Southwest Research Institute 
in Boulder, Colorado, who leads 
the cataloguing. 


DARK MARKS 

RSLs typically measure a few 
metres across and hundreds of 
metres long. One leading idea 
is that they form when the chilly 
Martian surface warms just 
enough to thaw an ice dam in the 
soil, allowing water to begin seep- 
ing downhill. When temperatures 
drop, the water freezes and the hill- 
side lightens again until next sea- 
son. But the picture is complicated 
by factors such as potential salt in the water; 
brines may seep at lower temperatures than 
fresher water’. 

Other possible explanations for the streaks 
include water condensing from the atmos- 
phere, or the flow of bone-dry debris. “They 
have a lot of behaviours that resemble liquid 
water,’ says Colin Dundas, a planetary geolo- 
gist at the US Geological Survey in Flagstaff, 
Arizona. “But Mars is a strange place, and it’s 
worth considering the possibility there are dry 
processes that could surprise us.” 

A study published last month used orbital 
infrared data to suggest that typical RSLs 
contain no more than 3% water’. And other 
streaky-slope Martian features, known as 
gullies, were initially thought to be caused by 
liquid water but are now thought to be formed 
mostly by carbon dioxide frost. 


TOP NEWS 


ALL WET? 


NASA's Curiosity rover is heading 
towards dark streaks (potential recurring 
slope lineae, RSLs) that could indicate 
the presence of water on Mars. 
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Dundas and his colleagues have counted 
58 possible RSLs near Curiosity’s landing site 
in Gale Crater*. Many of them appeared after 
a planet-wide dust storm in 2007 — possibly 
because the dust acted as a greenhouse and 
temporarily warmed the surface, Stillman 
says. 

Since January, mission scientists have used 
the ChemCam instrument aboard the rover — 
which includes a small telescope — to photo- 
graph nearby streaks whenever possible. 

So far, the rover has taken pictures of 8 of 
the 58 locations and seen no changes. The fea- 
tures are lines on slopes, but they have not yet 
recurred. “We've got two of the three letters in 
the acronym,” says Ryan Anderson, a geolo- 
gist at the US Geological Survey who leads the 
imaging campaign. 

Curiosity is currently about 5 kilometres 
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away from the potential RSLs; 
on its current projected path, it 
would never get any closer than 
about 2 kilometres, Vasavada says. 
The rover could not physically 
drive up and touch the streaks 
if it wanted to, because it cannot 
navigate the slopes of 25 degrees 
or greater on which they appear. 

But the rover’s sheer unex- 
pected proximity to potential 
RSLs has NASA re-evaluating its 
planetary-protection protocols. 
Curiosity was only partly steri- 
lized before going to Mars, and 
experts at JPL and NASA head- 
quarters in Washington DC are 
calculating how long the remain- 
ing microbes could survive in 
Mars’s harsh atmosphere — as 
well as what weather conditions 
could transport them several 
kilometres away and possibly 
contaminate a water seep. “That 
hasn‘t been well quantified for any 
mission,” says Vasavada. 

The work is an early test for 
the NASA Mars rover slated to 
launch in 2020, which will look 
for life and collect and stash 
samples for possible return to 
Earth. RSLs exist at several of 
the rover’s eight possible land- 
ing sites. 

For now, Curiosity is finish- 
ing exploring the Murray formation. This 
area is made of sediments from the bottom 
of ancient lakes — the sort of potentially life- 
supporting environment the rover was sent 
to find. Curiosity’s second extended mission 
begins on 1 October. 

Barring disaster, the rover’s lifespan will be 
set by its nuclear-power source, which will 
continue to dwindle in coming years through 
radioactive decay. Curiosity still has kilometres 
to scale on Aeolis Mons as it moves towards 
its final destination, a sulfate-rich group of 
rocks. = 
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CORRECTION 

The News story ‘Mars contamination fear 
could divert Curiosity rover’ (Nature 537, 
145-146; 2016) should have made it clear 
that the dark streaks near Curiosity are only 
‘potential’ recurring slope lineae. And it 
should have said that the Murray formation 
— not the Murray Buttes — was formed 
from ancient lake sediments. 
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London super-lab opens 
under cloud of Brexit 


Research begins at the unabashedly international Francis Crick Institute. 


BY EWEN CALLAWAY 


he stands on a bridge overlooking the 

grand atrium of the new Francis Crick 
Institute in London. Light floods in from the 
building's cathedral-like entrance. “I can’t quite 
believe it’s here.” 

Nurse, the institute's founding director, 
and his ten lab members are among the first 
researchers to begin working at the Crick, 
which opened to the media on 1 September. 

The UK government and the Crick’s 
other funders have gambled £700 million 
(US$927 million) on the institute, in the hope 
that it will attract some of world’s brightest 
young biomedical researchers and catalyse a 
boom in the UK life-sciences economy. 

The building will eventually house 
1,500 scientists and support staff, making it 
Europes largest single-site biomedical institute. 
They will study a broad portfolio of biomedical 
research, from immunology to cancer genetics. 

The 93,000-square-metre glass and steel 
temple looms over the neighbouring British 
Library, the largest public structure built in 
Britain in the twentieth century. But looming 
over the Crick is the prospect of Brexit. 


CC E is amazing, isnt it,” says Paul Nurse, as 


WORLD STAGE 

The UK vote on 23 June to leave the European 
Union poses a range of uncertainties for UK 
researchers, from access to European funding 
to the ease of moving between EU countries. 
“Our vision is to be a major research institute 
of great significance on the world stage,” says 
Nurse. “Internationalism is absolutely in our 
DNA” 

The Crick’s first researchers, who began 
arriving in mid-August, come mostly from 
two institutes in London: the National Insti- 
tute of Medical Research, run by the Medical 
Research Council, and the London Research 
Institute, run by Cancer Research UK. 

The plan is for the Crick to house a grow- 
ing and ever-changing roster of young group 
leaders, who will spend up to 12 years there. 

More than half of the Crick’s current post- 
docs are from EU countries other than the 
United Kingdom, Nurse notes, and limits to 
freedom of movement for EU workers could 
make it harder to recruit. If Britain does 
not secure access to EU research-funding 
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Paul Nurse says that internationalism is in the DNA of London’s Francis Crick Institute. 


programmes, that could also limit funding for 
the Crick’s scientists. 

Jernej Ule, a molecular biologist at Uni- 
versity College London who will spend three 
years at the Crick, is emblematic of Nurse's 
international vision. Ule is a native of Slovenia 

and did his PhD and 


“It portrays the postdoctoral work 
exact opposite in the United States. 
sentiment that His lab, which studies 
some people how changes in gene 
feel Brexit expression influence 


motor neuron disease 
and other neural con- 
ditions, includes scientists from Spain, Italy, 
France, Germany and the United Kingdom. 
“For me to recruit the best people, I need to 
have a capacity to throw a net very broadly,” 
he says. 

Ule also receives EU funding. After he 
arrived in the United Kingdom, he won a grant 
from the European Research Council (ERC) 
in 2007 to study RNA regulatory networks in 
neurons, then a nascent area of research. 

“Having the chance to apply for European 
funding at this top level is crucial to give us this 
independence of thinking in very new direc- 
tions,” he says. “Without the ERC I wouldn't be 
where I am right now.’ 

He and several other scientists who have 


represents.” 


begun working at the Crick say that the 
institute's mission is even more essential in the 
wake of the Brexit vote. 

“It's almost like we have the Crick in spite 
of Brexit, says Matthew Swaffer, a postdoc 
in Nurse’s lab. “I feel like it portrays the exact 
opposite sentiment that some people feel 
Brexit represents.” 

Swaffer’s colleague Tiffany Mak, a first-year 
PhD student, joined the Crick in part because 
of its allure as a mecca for researchers from 
a wide variety of disciplines — and that has 
not diminished. “This project puts so much 
emphasis on bringing people from all sorts 
of backgrounds together. Hopefully it will act 
as a hub and not let politics get in the way of 
science and collaboration” 

The Crick is likely to experience many of 
the same anxieties over Brexit as other UK 
research institutions, says Kieron Flanagan, a 
science-policy researcher at the University of 
Manchester. 

But the institute’s high profile — some have 
described it as “too big to fail” — could even 
buffer it from some Brexit worries, such as 
the ability to continue to recruit top scientists 
from Europe, he says. “They may have fewer 
problems than the university in the middle of 
nowhere in attracting people, but there will still 
be that concern there.” = 
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ALEX LENTATI/EVENING STANDARD/EYEVINE 


Stem cells are increasingly being used in unproven therapies at clinics in the United States. 


BIOMEDICINE 


Cell-therapy rules 
stir debate 


Controversial US guidelines attempt to rein in rogue 


stem-cell clinics. 


BY HEIDI LEDFORD 


homas Albini met his first patient 
"TPs by a stem-cell ‘treatment last 

year. The elderly woman, who had 
macular degeneration, thought she was paying 
to participate in a clinical trial that would save 
her sight by injecting stem cells into both eyes. 
Instead, it left her legally blind. 

By the time Albini, an ophthalmologist at 
the University of Miami in Florida, had treated 
two more women who had been blinded by 
the same procedure, he knew that there was 
a systemic problem. Two of the women had 
been lured by a posting in a clinical-trial reg- 
istry — even though there was no real trial to 
speak of — and none of the injections had been 
administered by a physician. The clinic offer- 
ing the injections claimed that its procedure 
did not require approval from the US Food and 
Drug Administration (FDA), in part because it 
used the patient’s own cells. Altogether, Albini 
found the cases shocking. “Any sort of review 
would have been helpful.” 

The debate over whether the FDA should 
review such treatments is growing more 
intense as purported stem-cell clinics prolif- 
erate across the United States. Current FDA 


regulations are poorly enforced and leave room 
for various interpretations. On 8 September, 
Albini will present his experiences at an FDA 
workshop. The following week, dozens of 
researchers, companies and patient advocates 
will flock to Bethesda, Maryland, for an FDA 
public hearing. Many of them will tout the 
virtues of unproven stem-cell therapies and 
insist that people should have the right to such 
treatments. The FDA has expanded the one- 
day hearing to two — and moved it to a larger 
auditorium — in response to overwhelming 
public interest. 

The discussion will focus on FDA proposals 
that aim to better define which cell therapies 
deserve strict regulation. If adopted, these con- 
troversial guidelines could encompass a large 
chunk of the cell-therapy clinics that claim to 
fall largely outside the agency’s purview. 

A burgeoning industry has sprung up in the 
absence of definitive oversight. A recent study 
of stem-cell clinics that advertise online uncov- 
ered 570 such centres operating in the United 
States (L. Turner and P. Knoepfler Cell Stem 
Cell 19, 154-157; 2016). 

Under FDA regulations, these clinics 
must prepare and store their therapies safely, 
and their facilities are subject to sporadic 
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inspections. But many clinics also operate 
under the assumption that they do not need 
the agency’s approval to carry out their proce- 
dures and do not have to conduct the clinical 
trials that the FDA normally demands to prove 
that a therapy works. Agency regulations state 
that clinics do not need regulatory approval ” 
if therapies involve “minimal manipulation” 
of cells that do not fundamentally alter their 
properties, and if those cells fulfil a “homolo- 
gous” function similar to their original role in 
the body. But the precise definitions of “mini- 
mal manipulation” and “homologous use” are 
controversial. 

A series of four FDA draft guidelines 
released in 2014 and 2015 addressed that ambi- 
guity by providing concrete examples of what 
would trigger greater FDA oversight. After 
soliciting public comment, the FDA will decide 
whether to amend and finalize the proposals. 

Not everyone is happy with the results up to 
now. Arnold Caplan, who studies regenerative 
medicine at Case Western Reserve University 
in Cleveland, Ohio, worries that the FDA will 
start seeking approvals for treatments that are 
now considered standard, including the use of 
abdominal fat in breast reconstruction follow- 
ing a mastectomy. 

Others are concerned that tighter guidelines 
will make it harder to bring discoveries to mar- 
ket. “Tt will potentially slow down translation 
in many instances,’ says Keith March, a cardio- 
logist at Indiana University in Indianapolis, 
who will also present at the public hearing. 
“We need to be cognizant of that.” 


STEVE GSCHMEISSNER/SPL 


TOO LATE? 

Some researchers are glad that the FDA is 
tackling the issue, however. Stem-cell 
researcher Jeanne Loring at the Scripps 
Research Institute in La Jolla, California, and 
her lab are talking to the FDA about start- 
ing clinical tests of a stem-cell treatment for 
Parkinson's disease. “They're making sure we 
know what we're doing,’ she says. 

But even if the FDA finalizes the proposals, 
it is unclear what effect the rules will have, says 
bioethicist Leigh Turner at the University of 
Minnesota in Minneapolis. The stem-cell clin- 
ics are too entrenched to be chased away by 
FDA guidance, he says. “The real question is if 
the FDA is going to send inspectors and issue 
warning letters.” 

For Albini, the proposed FDA guidance is 
not a perfect solution, but it is at least a step 
in the right direction. He may never know for 
sure why the treatments blinded his patients. 
And he acknowledges that clearer guidelines — 
and stricter enforcement — will not prevent 
every such tragedy in the future. Neither will 
they keep some clinics from recruiting patients 
under the guise of conducting clinical trials. 
But every step counts. “The more regulatory 
hurdles you put in the way of somebody who 
wants to use the term ‘research’ as marketing, 
the better off wed be; Albini says. m 
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US science faces budget limbo 


Likelihood of stopgap spending measure grows in light of upcoming election. 


BY SARA REARDON 


nother year, another round of budget 
Az for US science agencies. When 

Congress returns from its summer break 
on 6 September, it will have just three weeks to 
pass a new government funding bill before 
the 2017 budget year begins on 1 October. 

Policy analysts predict that lawmakers will 
pass a stopgap funding measure that will keep 
agencies’ budgets flat until the presidential elec- 
tion in November — and perhaps into next year. 

That would leave the US National Institutes 
of Health (NIH), the National Science Founda- 
tion and other science agencies in a familiar, 
if uncomfortable, position: unable to start new 
programmes or to end old ones without per- 
mission from Congress, and unsure about their 
total funding for the year. 

More uncertainty will come early next year, 
when the next US president takes office and 
replaces most agency directors. “It will be a 
transition year, and will be difficult enough’, 
even without the budget limbo, says Matt 
Hourihan, director of the research and 
development budget and policy programme 
at the American Association for the Advance- 
ment of Science in Washington DC. 

One major question for agencies is how a 
budget deal between the House of Representa- 
tives and the Senate would reconcile the two 
bodies’ very different 2017 spending plans. 


The House has proposed increasing the NIH’s 
budget by US$1.3 billion over the 2016 level; 
the Senate has suggested a $2-billion boost. 
The House spending bill for NASA includes an 
extra $200 million for the agency’s planetary- 
science programme compared with the current 
level, whereas the Senate has proposed cutting 
the programme's budget by about $300 million. 

Then there is the beleaguered international 
nuclear-fusion project ITER, which is funded 
bya consortium that includes the Department 
of Energy (DOE). The Senate has proposed 
cutting all US sup- 


portforITERin2017  « Everything 
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the budget.” 


plan would have the 
United States con- 
tinue to contribute roughly $115 million per 
year to ITER, with flat funding for most other 
DOE programmes. 

The House and Senate do agree on some 
things, however. Neither included money for 
the White House’s proposed $680-million 
Cancer Moonshot Initiative. Ben Krinsky, 
legislative-affairs officer at the Federation of 
American Societies for Experimental Biology 
in Washington DC, says that Congress might 
be more willing to provide funding once it sees 
the NIH’s final road map for the project, which 


the agency is due to release later this month. 

Meanwhile, the Senate is expected to vote 
this week on legislation that would create a 
$1.1-billion emergency fund for response to 
the Zika virus and research towards a vaccine. 
The US Department of Health and Human 
Services says that its budget for fighting the 
virus has almost run out — even though in 
August it took back $81 million from the 
budgets of the NIH and other agencies to pay 
for Zika response efforts. 

But perhaps the most immediate question for 
Congress and the science agencies is how long 
a temporary spending measure would last. The 
timing will be influenced by the 8 November 
general election, in which the White House, 
all 435 House seats and one-third of the Senate 
are up for grabs. December is often mentioned 
as a probable end date, but that would require 
Congress to return for a ‘lame duck’ session 
after the election. And some conservative law- 
makers have proposed that any temporary 
funding plan should be extended until after 
the next president takes office. 

This would be a problem for the science 
agencies, says Jason Callahan, space-policy 
adviser at the Planetary Society in Alexandria, 
Virginia. 

“Everything will increase in cost if there’s 
uncertainty in the budget,’ he says. “It’s bad 
policy to run the federal government on con- 
tinuing resolutions, but it’s an election year? = 


Mystery surrounds cells 


Samples of popular brain-cancer cell line do not match its 
50-year-old source, puzzling researchers. 


BY ELIE DOLGIN 


iomedical scientists are often urged to 
B check that their cell lines are not con- 

taminated or mislabelled. But as a recent 
study shows, any effort to authenticate a cell 
line is only as good as the reference standard 
against which the cells are compared. 

A cell line that is widely used to study brain 
cancer does not match the cells used to create 
the line nearly 50 years ago, or the tumour 
purported to be its source, researchers 


reported on 31 August (M. Allen et al. Sci. 
Transl. Med. 8, 354re3; 2016). In fact, no one is 
quite sure of the true provenance of the cell line 
distributed by most cell repositories. 

Because few cell lines are ever verified 
against their primary-source material, “this 
paper is probably just the tip of the iceberg”, 
says Christopher Korch, a geneticist at the Uni- 
versity of Colorado Denver. 

Many groups are trying to tackle the 
problem of misidentified cell lines to improve 
the reproducibility of research findings. This 


year, the US National Institutes of Health 
started requiring grant applicants to describe 
how they will authenticate their cell lines. And 
journals such as Nature have begun to ask 
authors to check their cells against a database 
of 475 lines (and counting) that are known to 
be mixed up. 

But no organizations have called for the 
kind of archival sleuthing that produced the 
new study. “It’s hard enough to get people to 
do the standard authentication,” says Leonard 
Freedman, president of the Global Biological 
Standards Institute, a non-profit organiza- 
tion in Washington DC that has found that 
most life scientists never authenticate their 
cells (L. P. Freedman et al. BioTechniques 59, 
189-192; 2015). “This is much more elaborate.” 

The cell line in question, U87, was 
established in 1966 at Uppsala University 
in Sweden, using tissue from a 44-year-old 
woman with an aggressive brain cancer 
known as glioblastoma. U87 has since been > 
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> used in countless investigations that have 
yielded around 2,000 scientific papers. 

The enthusiasm for U87 initially puzzled 
Bengt Westermark, a tumour biologist at 
Uppsala. As a graduate student in the 1970s, 
he studied eight different brain-cancer cell 
lines. U87 was “hopeless to work with’, he 
says, because it grew much more slowly than 
the others. 

Years later, Westermark got his hands on 
the version of U87 that is distributed by the 
American Type Culture Collection (ATCC), 
a cell repository in Manassas, Virginia. He 
could see from the cells’ growth properties 
that this U87 was clearly different from the 
cells that gave him so much grief in graduate 
school. Westermark decided to do a formal 
comparison. 

Fortunately, Uppsala had preserved the 
tumour tissue that spawned the original 
cell line. This enabled Westermark’s team to 
verify the identity of the archival U87 sample in 
their freezer. The researchers then used DNA- 
fingerprinting techniques to show that the 
ATCC’s U87 was different — and that it didn’t 
match any other cell lines created at Uppsala. 

Mindy Goldsborough, ATCC’s chief science 
and technology officer, says the repository 
acquired its U87 line in 1982 from the Memo- 
rial Sloan Kettering Cancer Center in New 
York City, which itself received the cell line 


The cell line U87 came from a glioma similar to 
this tumour, but beyond that its origin is unknown. 


from Uppsala in 1973. By the time it arrived at 
the ATCC, U87 had a Y chromosome — even 
though it was said to have come from a woman. 
This suggests that the mix-up probably hap- 
pened at Sloan Kettering or during one of the 
hand-overs. 

In light of these revelations, the ATCC plans 
to update the background details in its listing 
for U87, which it describes as male. But the 
origin of the U87 line remains a mystery. 

Westermark’s team has conducted a com- 
parison of gene-expression profiles that sug- 
gests that the ATCC cell line came from a 
brain tumour. “It’s bad news that it’s not what 
it should be,’ he says, “but it’s good news that 
it’s probably a glioblastoma.” This means that 
studies of U87 still reflect brain-cancer biology 
and don't need to be tossed out, he adds. 


Still, many cancer researchers think that it 
is time to move beyond U87 and other ‘clas- 
sical’ cell lines — regardless of their origins 
— because the culture conditions historically 
used to grow the cells change their biological 
nature. Westermark and others now favour 
newer cell lines that have been propagated 
on the types of growth medium that ensure 
genetic and epigenetic stability. Through its 
Human Glioma Cell Culture biobank, Uppsala 
provides these sorts of cell to other researchers 
for a small processing fee. 

“What we've historically used is so poorly 
representative of the human disease,” says 
Howard Fine, a neuro-oncologist at the Weill 
Cornell Brain Tumor Center in New York City. 
“So, any time someone can shoot down the 
[U87] cell line, 'm happy:” m 


CORRECTIONS 

The News story ‘Who will build the next 
LHC?’ (Nature 536, 383-384; 2016) should 
have said that souping up the current 

LHC would take it to an energy of 28 TeV, 
not 20 TeV. And the News Feature ‘Digital 
DNA’ (Nature 537, 22-24; 2016) gave 

an incorrect size for the 2013 EBI files. 

The correct figure is 739 kilobytes, not 

739 kilobases. 


STEVE GSCHMEISSNER/SPL 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


| NEWS | FEATURE 


A long-running study of 
exceptional children reveals 
what it takes to produce the 
scientists who will lead the 
twenty-first century. 


BY TOM CLYNES 
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nasummer day in 1968, professor Julian Stanley met 

a brilliant but bored 12-year-old named Joseph Bates. 

The Baltimore student was so far ahead of his class- 

mates in mathematics that his parents had arranged 

for him to take a computer-science course at Johns 

Hopkins University, where Stanley taught. Even that 
wasn't enough. Having leapfrogged ahead of the adults in the class, 
the child kept himself busy by teaching the FORTRAN programming 
language to graduate students. 

Unsure of what to do with Bates, his computer instructor introduced 
him to Stanley, a researcher well known for his work in psychometrics 
— the study of cognitive performance. To discover more about the 
young prodigy’s talent, Stanley gave Bates a battery of tests that included 
the SAT college-admissions exam, normally taken by university-bound 
16- to 18-year-olds in the United States. 

Bates’s score was well above the threshold for admission to Johns 
Hopkins, and prompted Stanley to search for a local high school that 
would let the child take advanced mathemat- 
ics and science classes. When that plan failed, 
Stanley convinced a dean at Johns Hopkins to 
let Bates, then 13, enrol as an undergraduate. 

Stanley would affectionately refer to Bates 
as “student zero” of his Study of Mathemati- 
cally Precocious Youth (SMPY), which would 
transform how gifted children are identified 
and supported by the US education system. 
As the longest-running current longitudi- 
nal survey of intellectually talented children, 
SMPY has for 45 years tracked the careers and 
accomplishments of some 5,000 individuals, 
many of whom have gone on to become high- 
achieving scientists. The study’s ever-growing 
data set has generated more than 400 papers 
and several books, and provided key insights 
into how to spot and develop talent in science, 
technology, engineering, mathematics (STEM) 
and beyond. 

“What Julian wanted to know was, how do you find the kids with the 
highest potential for excellence in what we now call STEM, and how 
do you boost the chance that they'll reach that potential,” says Camilla 
Benbow, a protégé of Stanley’s who is now dean of education and human 
development at Vanderbilt University in Nashville, Tennessee. But 
Stanley wasn't interested in just studying bright children; he wanted 
to nurture their intellect and enhance the odds that they would change 
the world. His motto, he told his graduate students, was “no more dry 
bones methodology”. 

With the first SMPY recruits now at the peak of their careers’, what 
has become clear is how much the precociously gifted outweigh the rest 
of society in their influence. Many of the innovators who are advanc- 
ing science, technology and culture are those whose unique cognitive 
abilities were identified and supported in their early years through 
enrichment programmes such as Johns Hopkins University’s Center 
for Talented Youth — which Stanley began in the 1980s as an adjunct 
to SMPY. At the start, both the study and the centre were open to young 
adolescents who scored in the top 1% on university entrance exams. 
Pioneering mathematicians Terence Tao and Lenhard Ng were one- 
percenters, as were Facebook’s Mark Zuckerberg, Google co-founder 
Sergey Brin and musician Stefani Germanotta (Lady Gaga), who all 
passed through the Hopkins centre. 

“Whether we like it or not, these people really do control our society,’ 

says Jonathan Wai, a psychologist at the Duke University Talent Iden- 
tification Program in Durham, North Carolina, which collaborates 
with the Hopkins centre. Wai combined data from 11 prospective and 
retrospective longitudinal studies”, including SMPY, to demonstrate 
the correlation between early cognitive ability and adult achievement. 
“The kids who test in the top 1% tend to become our eminent scientists 


“Whether we 
like it or not, 
these people 
really do 
control our 
society.” 
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and academics, our Fortune 500 CEOs and federal judges, senators and 
billionaires, he says. 

Such results contradict long-established ideas suggesting that expert 
performance is built mainly through practice — that anyone can get 
to the top with enough focused effort of the right kind. SMPY, by con- 
trast, suggests that early cognitive ability has more effect on achievement 
than either deliberate practice or environmental factors such as socio- 
economic status. The research emphasizes the importance of nurturing 
precocious children, at a time when the prevailing focus in the United 
States and other countries is on improving the performance of strug- 
gling students (see ‘Nurturing a talented child’). At the same time, the 
work to identify and support academically talented students has raised 
troubling questions about the risks of labelling children, and the short- 
falls of talent searches and standardized tests as a means of identifying 
high-potential students, especially in poor and rural districts. 

“With so much emphasis on predicting who will rise to the top, we 
run the risk of selling short the many kids who are missed by these tests,” 
says Dona Matthews, a developmental psychol- 
ogist in Toronto, Canada, who co-founded the 
Center for Gifted Studies and Education at 
Hunter College in New York City. “For those 
children who are tested, it does them no favours 
to call them ‘gifted’ or ‘ungifted’ Either way, it 
can really undermine a child’s motivation to 
learn” 


START OF A STUDY 

On a muggy August day, Benbow and her 
husband, psychologist David Lubinski, describe 
the origins of SMPY as they walk across the 
quadrangle at Vanderbilt University. Benbow 
was a graduate student at Johns Hopkins when 
she met Stanley in a class he taught in 1976. 
Benbow and Lubinski, who have co-directed 
the study since Stanley's retirement, brought it 
to Vanderbilt in 1998. 

“Ina sense, that brought Julian’s research full 
circle, since this is where he started his career as a professor,’ Benbow 
says as she nears the university’s psychology laboratory, the first US 
building dedicated to the study of the field. Built in 1915, it houses a 
small collection of antique calculators — the tools of quantitative psy- 
chology in the early 1950s, when Stanley began his academic work in 
psychometrics and statistics. 

His interest in developing scientific talent had been piqued by one 
of the most famous longitudinal studies in psychology, Lewis Terman’s 
Genetic Studies of Genius**. Beginning in 1921, Terman selected teen- 
age subjects on the basis of high IQ scores, then tracked and encouraged 
their careers. But to Terman’s chagrin, his cohort produced only a few 
esteemed scientists. Among those rejected because their IQ of 129 was 
too low to make the cut was William Shockley, the Nobel-prizewinning 
co-inventor of the transistor. Physicist Luis Alvarez, another Nobel win- 
ner, was also rejected. 

Stanley suspected that Terman wouldn't have missed Shockley and 
Alvarez if hed had a reliable way to test them specifically on quantita- 
tive reasoning ability. So Stanley decided to try the Scholastic Aptitude 
Test (now simply the SAT). Although the test is intended for older stu- 
dents, Stanley hypothesized that it would be well suited to measuring the 
analytical reasoning abilities of elite younger students. 

In March 1972, Stanley rounded up 450 bright 12- to 14-year-olds 
from the Baltimore area and gave them the mathematics portion of 
the SAT. It was the first standardized academic ‘talent search. (Later, 
researchers included the verbal portion and other assessments.) 

“The first big surprise was how many adolescents could figure out 
math problems that they hadn't encountered in their course work,’ says 
developmental psychologist Daniel Keating, then a PhD student at Johns 
Hopkins University. “The second surprise was how many of these young 
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kids scored well above the admissions cut-off 
for many elite universities.” 

Stanley hadn't envisioned SMPY asa multi- 
decade longitudinal study. But after the first 
follow-up survey, five years later, Benbow 
proposed extending the study to track subjects 
through their lives, adding cohorts and includ- 
ing assessments of interests, preferences, and 
occupational and other life accomplishments. 
The study’s first four cohorts range from the 
top 3% to the top 0.01% in their SAT scores. 
The SMPY team added a fifth cohort of the 
leading mathematics and science graduate stu- 
dents in 1992 to test the generalizability of the 
talent-search model for identifying scientific 
potential. 

“T don't know of any other study in the world 
that has given us such a comprehensive look at 
exactly how and why STEM talent develops,” 
says Christoph Perleth, a psychologist at the 
University of Rostock in Germany who studies 
intelligence and talent development. 


SPATIAL SKILLS 

As the data flowed in, it quickly became 
apparent that a one-size-fits-all approach to 
gifted education, and education in general, 
was inadequate. 

“SMPY gave us the first large-sample basis 
for the field to move away from general intel- 
ligence toward assessments of specific cogni- 
tive abilities, interests and other factors,” says 
Rena Subotnik, who directs the Center for 
Gifted Education Policy at the American Psy- 
chological Association in Washington DC. 

In 1976, Stanley started to test his sec- 
ond cohort (a sample of 563 13-year-olds 
who scored in the top 0.5% on the SAT) on 
spatial ability — the capacity to understand 
and remember spatial relationships between 
objects’. Tests for spatial ability might include 
matching objects that are seen from differ- 


BRIGHT START 
Nurturing a talented child 


“Setting out to raise a genius is the last 
thing we'd advise any parent to do,” 
says Camilla Benbow, dean of education 
and human development at Vanderbilt 
University in Nashville, Tennessee. That 
goal, she says, “can lead to all sorts of 
social and emotional problems”. 
Benbow and other talent-development 
researchers offer the following tips 
to encourage both achievement and 
happiness for smart children. 


@ Expose children to diverse experiences. 
@ When a child exhibits strong interests 
or talents, provide opportunities to 
develop them. 

@ Support both intellectual and 
emotional needs. 

@ Help children to develop a ‘growth 
mindset’ by praising effort, not ability. 

@ Encourage children to take intellectual 
risks and to be open to failures that help 
them learn. 

@ Beware of labels: being identified as 
gifted can be an emotional burden. 

@ Work with teachers to meet your child’s 
needs. Smart students often need more- 
challenging material, extra support or the 
freedom to learn at their own pace. 

@ Have your child’s abilities tested. This 
can support a parent’s arguments for 
more-advanced work, and can reveal 
issues such as dyslexia, attention-deficit/ 
hyperactivity disorder, or social and 
emotional challenges. T.C. 


In 2010, China launched a ten-year National 
Talent Development Plan to support and 
guide top students into science, technology 
and other high-demand fields. 

In Europe, support for research and edu- 
cational programmes for gifted children has 
ebbed, as the focus has moved more towards 
inclusion. England decided in 2010 to scrap 
the National Academy for Gifted and Tal- 
ented Youth, and redirected funds towards 
an effort to get more poor students into lead- 
ing universities. 


ON THE FAST TRACK 

When Stanley began his work, the choices 
for bright children in the United States were 
limited, so he sought out environments in 
which early talent could blossom. “It was 
clear to Julian that it’s not enough to identify 
potential; it has to be developed in appropri- 
ate ways if you're going to keep that flame 
well lit” says Linda Brody, who studied with 
Stanley and now runs a programme at Johns 
Hopkins focused on counselling profoundly 
gifted children. 

At first, the efforts were on a case-by-case 
basis. Parents of other bright children began 
to approach Stanley after hearing about his 
work with Bates, who thrived after entering 
university. By 17, he had earned bachelor’s 
and master’s degrees in computer science and 
was pursuing a doctorate at Cornell Univer- 
sity in Ithaca, New York. Later, as a professor 
at Carnegie Mellon University in Pittsburgh, 
Pennsylvania, he would become a pioneer in 
artificial intelligence. 

“T was shy and the social pressures of high 
school wouldn't have made it a good fit for 
me,’ says Bates, now 60. “But at college, with 
the other science and math nerds, I fit right 
in, even though I was much younger. I could 
grow up on the social side at my own rate and 


ent perspectives, determining which cross-section will result when an 
object is cut in certain ways, or estimating water levels on tilted bottles of 
various shapes. Stanley was curious about whether spatial ability might 
better predict educational and occupational outcomes than could meas- 
ures of quantitative and verbal reasoning on their own. 

Follow-up surveys — at ages 18, 23, 33 and 48 — backed up his hunch. 
A 2013 analysis’ found a correlation between the number of patents and 
peer-refereed publications that people had produced and their earlier 
scores on SATs and spatial-ability tests. The SAT tests jointly accounted for 
about 11% of the variance; spatial ability accounted for an additional 7.6%. 

The findings, which dovetail with those of other recent studies, sug- 
gest that spatial ability plays a major part in creativity and technical 
innovation. “I think it may be the largest known untapped source of 
human potential,” says Lubinski, who adds that students who are only 
marginally impressive in mathematics or verbal ability but high in spa- 
tial ability often make exceptional engineers, architects and surgeons. 
“And yet, no admissions directors I know of are looking at this, and it’s 
generally overlooked in school-based assessments.” 

Although studies such as SMPY have given educators the ability 
to identify and support gifted youngsters, worldwide interest in this 
population is uneven. In the Middle East and east Asia, high-perform- 
ing STEM students have received significant attention over the past 
decade. South Korea, Hong Kong and Singapore screen children for 
giftedness and steer high performers into innovative programmes. 
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also on the intellectual side, because the faster pace kept me interested 
in the content.” 

The SMPY data supported the idea of accelerating fast learners by 
allowing them to skip school grades. In a comparison of children who 
bypassed a grade with a control group of similarly smart children who 
didnt, the grade-skippers were 60% more likely to earn doctorates or 
patents and more than twice as likely to get a PhD in a STEM field°. 
Acceleration is common in SMPY’s elite 1-in-10,000 cohort, whose 
intellectual diversity and rapid pace of learning make them among the 
most challenging to educate. Advancing these students costs little or 
nothing, and in some cases may save schools money, says Lubinski. 
“These kids often don't need anything innovative or novel,” he says, 
“they just need earlier access to what's already available to older kids.” 

Many educators and parents continue to believe that acceleration 
is bad for children — that it will hurt them socially, push them out of 
childhood or create knowledge gaps. But education researchers gener- 
ally agree that acceleration benefits the vast majority of gifted children 
socially and emotionally, as well as academically and professionally’. 

Skipping grades is not the only option. SMPY researchers say that 
even modest interventions — for example, access to challenging material 
such as college-level Advanced Placement courses — have a demon- 
strable effect. Among students with high ability, those who were given 
a richer density of advanced precollegiate educational opportunities in 
STEM went on to publish more academic papers, earn more patents and 
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pursue higher-level careers than their equally smart peers who didn’t 
have these opportunities®. 

Despite SMPY’s many insights, researchers still have an incomplete 
picture of giftedness and achievement. “We don’t know why, even at 
the high end, some people will do well and others won't,’ says Douglas 
Detterman, a psychologist who studies cogni- 
tive ability at Case Western Reserve Univer- 
sity in Cleveland, Ohio. “Intelligence won't 
account for all the differences between peo- 
ple; motivation, personality factors, how hard 
you work and other things are important.” 

Some insights have come from German 


Gee ies outperform the others. 
studies” that have a methodology similar 


Top of the charts 


Long-term studies of gifted students — those 
who scored in the top 1% as adolescents on the 
mathematics section of the SAT — reveal that 
people at the very top of the range went on to 
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students’ options, but rather to develop learning and teaching strategies 
appropriate to children’s abilities, which allow students at every level to 
reach their potential. 

Next year, Benbow and Lubinski plan to launch a mid-life survey 
of the profoundly gifted cohort (the 1 in 10,000), with an emphasis 
on career achievements and life satisfaction, 
and to re-survey their 1992 sample of gradu- 
ate students at leading US universities. The 
forthcoming studies may further erode the 
enduring misperception that gifted children 
are bright enough to succeed on their own, 
without much help. 

“The education community is still resistant 


to SMPY’s. The Munich Longitudinal Study 
of Giftedness, which started tracking 26,000 
gifted students in the mid-1980s, found 


OUTCOMES 


Aerio Any doctorate by 
STEM publications a 


to this message,” says David Geary, a cognitive 
developmental psychologist at the University 
of Missouri in Columbia, who specializes in 


that cognitive factors were the most predic- 30 —-** STEM doctorates —-— mathematical learning. “There's a general 
tive, but that some personal traits — such as sonore ASTM x belief that kids who have advantages, cogni- 
motivation, curiosity and ability to cope with — Baie a 2 tive or otherwise, shouldn't be given extra 
stress — had a limited influence on perfor- g 25 s encouragement; that we should focus more 
mance. Environmental factors, such as fam- o wy on lower-performing kids” 
ily, school and peers, also had an impact. = : Although gifted-education specialists 
The data from such intellectual-talent = 20 vee herald the expansion of talent-development 
searches also contribute to knowledge of how 5 s £ options in the United States, the benefits have 
people develop expertise in subjects. Some 2 & Sy mostly been limited so far to students who 
researchers and writers, notably psychologist > < are at the top of both the talent and socio- 
Anders Ericsson at Florida State University  §§ economic curves. 
in Tallahassee and author Malcolm Gladwell, s “We know how to identify these kids, and 
have popularized the idea of an ability s we know how to help them,” says Lubinski. 
threshold. This holds that for individuals 7 am “And yet we're missing a lot of the smartest 
beyond a certain IQ barrier (120 is often 5 kids in the country,” 
cited), concentrated practice time is much 5 As Lubinski and Benbow walk through the 
oO 


more important than additional intellectual 5 
abilities in acquiring expertise. But data from 

SMPY and the Duke talent programme dis- 

pute that hypothesis (see “Top of the charts’). 0) 
A study published this year'* compared the 

outcomes of students in the top 1% of child- 

hood intellectual ability with those in the top 

0.01%. Whereas the first group gain advanced degrees at about 25 times 
the rate of the general population, the more elite students earn PhDs at 
about 50 times the base rate. 

But some of the work is controversial. In North America and Europe, 
some child-development experts lament that much of the research on 
talent development is driven by the urge to predict who will rise to the 
top, and educators have expressed considerable unease about the concept 
of identifying and labelling a group of pupils as gifted or talented”. 

“A high test score tells you only that a person has high ability and 
is a good match for that particular test at that point in time,” says 
Matthews. “A low test score tells you practically nothing,’ she says, 
because many factors can depress students’ performance, including 
their cultural backgrounds and how comfortable they are with tak- 
ing high-stakes tests. Matthews contends that when children who are 
near the high and low extremes of early achievement feel assessed in 
terms of future success, it can damage their motivation to learn and 
can contribute to what Stanford University psychologist Carol Dweck 
calls a fixed mindset. It’s far better, Dweck says, to encourage a growth 
mindset, in which children believe that brains and talent are merely a 
starting point, and that abilities can be developed through hard work 
and continued intellectual risk-taking. 

“Students focus on improvement instead of worrying about how 
smart they are and hungering for approval,” says Dweck. “They work 
hard to learn more and get smarter.’ Research by Dweck and her col- 
leagues shows that students who learn with this mindset show greater 
motivation at school, get better marks and have higher test scores”. 

Benbow agrees that standardized tests should not be used to limit 
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quadrangle, the clock strikes noon, releas- 
ing packs of enthusiastic adolescents racing 
towards the dining hall. Many are partici- 
pants in the Vanderbilt Programs for Talented 
Youth, summer enrichment courses in which 
gifted students spend three weeks gorging 
themselves on a year’s worth of mathematics, 
science or literature. Others are participants in Vanderbilt's sports camps. 

“They're just developing different talents,’ says Lubinski, a former 
high-school and college wrestler. “But our society has been much more 
encouraging of athletic talents than we are of intellectual talents.’ 

And yet these gifted students, the ‘mathletes’ of the world, can shape 
the future. “When you look at the issues facing society now — whether 
it’s health care, climate change, terrorism, energy — these are the kids 
who have the most potential to solve these problems,” says Lubinski. 
“These are the kids we'd do well to bet on” m 


*Students were split 
into quartiles (Q1-4) on 
the basis of their maths 
SAT score at age 13. 
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Tom Clynes is a journalist and the author of The Boy Who Played 
With Fusion: Extreme Science, Extreme Parenting and How to Make 
a Star. 
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ADVANCES IN CATALYST RESEARCH COULD CREATE A SUPERHIGHWAY TO CLEAN ENERGY 
SOURCES AND A MORE-SUSTAINABLE CHEMICAL INDUSTRY. 


BY XIAOZHI LIM 


Fulhame noted a peculiar fact: substances such as coal and charcoal 
burned better when they were damp. After many experiments to 
understand why, she concluded that the water briefly split into hydro- 
gen and oxygen, which interacted with the other compounds in a way 
that made the combustion go faster. Yet at the end, Fulhame wrote, 
the process “forms a new quantity of water equal to that decomposed”. 
Many historians consider this to be the first scientific account of a 
catalyst: a material that speeds reactions by making or breaking chemi- 
cal bonds, without being consumed. It was hardly the last: modern 
chemistry would be almost inconceivable without catalysts. “They not 
only make transformations accessible, but also direct them in new ways,’ 
says Susannah Scott, a chemist at the University of California, Santa 
Barbara. “That's very powerful” 

Catalysts are used in some 90% of processes in the chemical industry, 
and are essential for the production of fuels, plastics, drugs and fertiliz- 
ers. At least 15 Nobel prizes have been awarded for work on catalysis. 
And thousands of chemists around the world are continually improving 
the catalysts they have and striving to invent new ones. 

That work is partly driven by an interest in sustainability. The aim of 


] nher 1794 book, An Essay on Combustion, Scottish chemist Elizabeth 
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catalysis is to direct reactions along precisely defined pathways so that 
chemists can skip reaction steps, reduce waste, minimize energy use and 
do more with less. And with growing concerns about climate change 
and the environment, sustainability has become increasingly important. 
Catalysis is a key principle of ‘green chemistry’: an industry-wide effort 
to prevent pollution before it happens. 

Catalysts are also seen as the key to unlocking energy sources that 
are much more inert and difficult to use than coal, oil or gas, but much 
cleaner. Catalysis can make it more economically feasible to split water 
into oxygen and hydrogen fuel, or can open up new ways to use raw 
materials such as biomass or carbon dioxide. “These are feedstocks that 
are ripe for advances in catalysis,” says Melanie Sanford, a chemist at the 
University of Michigan in Ann Arbor. 

These challenges have led to an explosion in catalyst innovation, 
with the annual number of publications on the subject tripling in the 
past decade. Many groups are coming up with new small-molecule 
complexes or are chemically tailoring biological enzymes in search of 
radically new catalytic activity. Others are pursuing advances in nano- 
technology, which allow them to engineer the action of solid catalysts 
at the atomic scale. Still others are experimenting with catalysts that 
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are activated by light, or that incorporate the DNA double helix. And 
everyone in the field is trying to streamline the search for better catalysts 
with modern computational modelling tools. 

The pace of innovation is such that even the experts are struggling to 
keep up, says Scott, who leads the US Department of Energy’s efforts to 
develop benchmarks for the new catalysts’ performance’. “We need to 
make sure we are advancing the science that’s most efficient; she says. 

And the scope of catalysis is increasing rapidly. “Twenty years ago,” 
says John Hartwig, a chemist at the University of California, Berkeley, 
“catalysis to make molecules that were complex did not exist.” Anyone 
who wanted to modify a large complicated structure would have to tear 
it down and build it back up, says Sanford. But now, chemists can often 
edit parts of a molecule precisely. “It’s incredibly enabling,” she says. 


CUT-PRICE CATALYSTS 

Using a catalyst is like bulldozing a shortcut between reactants A and 
product B, bypassing convoluted chemical pathways that might other- 
wise take forever. Using a really good catalyst is like building a multilane 
superhighway. And some of the best are the ‘homogeneous’ catalysts: 
free-floating molecules that are mixed in with the reactants. 

Industrial catalysts in this category most often consist of a metal 
ion that does the hard work of making or breaking chemical bonds, 
surrounded by ‘ligands’: connected groups, often carbon-based, that 
control the reactants’ access to the ion. Much of the research in this 
field comes down to tailoring these ligands to produce a catalyst that 
performs only a desired reaction. 

Unfortunately, many of the successes so far have come through 
the use of scarce and expensive metals such as palladium, platinum, 
ruthenium and iridium. Today, chemists are increasingly striving to 
build catalysts around cheaper, ‘Earth-abundant elements such as iron, 
nickel or copper — or to do without metals altogether. 

Nickel is a particularly attractive candidate for mimicking the 
chemistry of palladium and platinum because it sits directly above them 
in the periodic table, and therefore has similar properties. At the Swiss 
Federal Institute of Technology in Lausanne, for example, synthetic 
chemist Xile Hu and his group are working with a remarkably versatile 
nickel complex’ that they first reported in 2008. The complex consists 
of a nickel ion surrounded by a single, large ligand that binds to it in 
three places, leaving a fourth binding spot available for catalysing reac- 
tions. A similar ligand is already used in certain palladium catalysts. 
But the radius of a nickel ion is almost 20% smaller than that of a pal- 
ladium ion, so Hu had to shrink the ligand to fit it more closely around 
the nickel. To do so, he replaced phosphorus atoms in the ligand with 
smaller nitrogen ones. 

The result is a rigid ligand that stabilizes the nickel ion as it performs a 
wide array of reactions”. The original nickel catalyst is already available 
commercially, and Hu is systematically modifying the ligand to make 
a whole family of catalysts. 

In 2008, chemists discovered that certain standard catalysts could 
be made more powerful by combining them with a technique known 
as photoredox catalysis. When photoredox catalysts absorb light, an 
electron leaps from the metal ion to the ligand and becomes stuck there, 
leaving the molecule in an unstable state. “The catalyst becomes desper- 
ate to fill the hole in the metal and get rid of the electron in the ligand,” 
explains David MacMillan, a chemist at Princeton University in New 
Jersey who first reported’ the idea in collaboration with chemist David 
Nicewicz from the University of North Carolina at Chapel Hill. But 
the only way the photoredox system can accomplish this is to trade 
electrons with the standard catalyst, supercharging it and triggering 
chemical transformations that were previously impossible. As a bonus, 
the photoredox catalysis drives the process with energy that it absorbed 
from light, reducing the heat required to keep the reaction going. 

Nicewicz and MacMillan have independently used photoredox 
catalysis to make major improvements to the Buchwald—Hartwig 
reaction, which is frequently used to bond carbon with nitrogen when 
making drugs. Typically, the reaction requires the use of palladium salts, 
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Attaching catalyst nanoparticles to different faces of a semiconductor 
crystal allows light to split water, but keeps the explosive products — 
hydrogen and oxygen — separate. 
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expensive, phosphorus-based ligands and difficult-to-make reactants. 
But in 2015, Nicewicz’s group announced’ that it had not only made a 
carbon-nitrogen bond using a completely metal-free catalyst, but had 
done so starting from cheaper and more accessible reactants; it is already 
being used by pharmaceutical companies, says Nicewicz. In June, Mac- 
Millan's group and its collaborators at Merck Research Laboratories in 
Rahway, New Jersey, reported® making the Buchwald-Hartwig reaction 
work with minute amounts of an iridium light absorber anda nickel salt, 
eliminating the need for ligands. 

A specific challenge for many researchers is to find better ways 
of creating the carbon-fluorine bonds at the heart of fluorinated 
compounds that are widely used in pharmaceuticals, agrochemicals 
and medical imaging. Currently, the bonds are made using expensive 
specialized reagents or the highly corrosive gas hydrogen fluoride. In 
2013, a team of researchers led by Sanford showed’ how to make such 
bonds with a safer potassium fluoride salt using a copper catalyst. First, 
the catalyst is exposed to a compound that strips away three of its elec- 
trons. This leaves the catalyst so hungry for electrons that it can pull 
some from a nearby fluoride ion, which holds them in a notoriously 
tight grip. The fluoride is then so desperate for a replacement electron 
that it will readily bind with a carbon atom to get it. 


PEBBLES IN A STREAM 

Despite their versatility, many homogeneous catalysts are fragile. Their 
internal bonds weaken after prolonged exposure to heat and collisions 
with reactant molecules, and their ligands start disintegrating. “They 
die after a while,’ says Sanford. 

That is a big reason why large-scale industry tends to use 
‘heterogeneous catalysts: solid materials that are fixed in place while 
the reactants stream past. A classic example is the mix of powdered 
platinum and other metals found in the catalytic converters that clean 
vehicle exhaust gases. In the past, chemists had a tough time designing 
heterogeneous catalysts with atomic precision because it was difficult 
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to make and study the active sites, where catalysis occurs, in a solid 
material. Mostly they had to optimize the catalysts through trial and 
error. But what's changing, says Scott, “is the synthetic control that we 
can exert over the materials” In particular, rapid advances in nanotech- 
nology are allowing chemists to work towards systems with the robust- 
ness of solid catalysts and the high performance of homogeneous ones. 

At the Chinese Academy of Sciences’ State Key Laboratory of Catal- 
ysis in Dalian, director Can Li has used platinum and cobalt oxide 
nanoparticles to create a catalyst for splitting water with sunlight"’ (see 
‘Light splitter’). He starts by sticking the nanoparticles to crystals of a 
semiconductor called bismuth vanadium oxide, with each type of par- 
ticle carefully isolated on a specific face of each crystal. Then, when he 
immerses the crystals in water and exposes them to light, photons strike 
the semiconductor and loosen electrons. The result is a flow of current 
that the nanoparticles use to break water molecules into hydrogen and 
oxygen. Oxygen gas comes bubbling off the cobalt oxide sites, while 
positively charged hydrogen ions migrate to the platinum particles. “We 
separated the active sites to block the reverse reaction,’ says Li — that is, 
a dangerously explosive conversion of hydro- 
gen and oxygen back into water. (To simplify 
the experimental set-up, the hydrogen ions are 
currently captured by a separate compound 
rather than turned into gas.) The process is not 
yet efficient enough to be economically viable, 
says Li. But his team is testing combinations of 
semiconductors and metal catalysts to refine 
the design. 

Audrey Moores, a chemist at McGill Uni- 
versity in Montreal, Canada, is tackling a bothersome issue in the 
pharmaceutical, cosmetics and food industries, which often use heavy- 
metal-ion catalysts. Ions of palladium, ruthenium and platinum are 
toxic, so products made with them cannot be sold until they have been 
through a series of meticulous and expensive cleansing steps. Moores 
is working on alternative catalysts based on iron, which is much safer. 

In 2014, her research group prepared a series of hollow, magnetic 
iron oxide nanoparticles for making benzaldehyde'': a molecule that 
smells like almonds and is widely used in flavourings. It is typically 
manufactured by reacting certain electron-hungry compounds with 
styrene: a sweet-smelling but hazardous liquid that is better known as a 
raw material for plastics. The process tends to generate a relatively small 
amount of benzaldehyde mixed with other molecules. But Moores iron 
nanoparticles catalyse a more controllable reaction between styrene and 
oxygen, yielding almost pure benzaldehyde. And as an added advantage, 
iron is magnetic, so at the end of the reaction the iron nanoparticles can 
be extracted for reuse with a magnet. 


EVEN-HANDEDNESS 

When making large, complex molecules such as steroids, antibiotics or 
hormones, a major challenge involves chirality, or the ‘handedness’ of 
a carbon atom. Such an atom carrying four different groups can have 
two configurations that are mirror images of each other, like human 
hands. A complex molecule may contain many such carbon atoms — 
and if even one of them has the wrong configuration, the compound can 
end up interacting badly with the human body. One notorious exam- 
ple is thalidomide, a drug developed in the 1950s for treating morning 
sickness in pregnant women. One chiral configuration was effective 
and safe for that purpose. But its mirror image, which was present in 
the over-the-counter drug, caused babies to be born with severe limb 
deformations. 

Molecules from biomass feedstocks contain a wide variety of chiral 
carbon atoms in a chain, and it is almost impossible to distinguish one 
from another. “A small-molecule catalyst wouldn't recognize it? says 
Hartwig. Instead, chemists are turning to biological enzymes, which can 
be large enough to recognize the overall shape of the target molecule 
and home in on the bond where the reaction should occur. Enzymes 
also have the advantage of using water as a solvent and working at body 
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temperatures, which makes them more environmentally friendly than 
processes that require toxic solvents and large amounts of heat. 

Naturally occurring enzymes don’t always catalyse the reactions that 
chemists want, however — which is why one frontier of catalysis research 
is to rework these proteins so that they do. Hartwig has been looking at 
the haem enzyme, which is similar to the compounds that carry oxygen 
in red blood cells, and has developed” an artificial enzyme that substi- 
tutes an iridium complex for the haem’s iron centre. Although this runs 
contrary to the goal of replacing precious metals with Earth-abundant 
ones, says Hartwig, iridium can work with strong bonds such as those 
between carbon and hydrogen, which iron cannot. His team is using 
crystallographic data to study the enzyme’ structures near the iridium 
site and is systematically modifying them so that they can precisely 
transform a carbon-hydrogen bond into a carbon-carbon bond with 
the desired chiral configuration — a formidable challenge. The chemists 
can prepare hundreds or even thousands of new enzymes in this way, 
limited only by the time it takes to test them and analyse their activity. 

Still, enzymes are very specific to their target, and although they yield 
a product with a single chiral configuration, it 
is often the configuration that isn’t wanted. “If 
youre interested in the other, you're in trouble,” 
says Stellios Arseniyadis, a synthetic chemist at 
Queen Mary University of London. To address 
that problem, Arseniyadis is collaborating with 
Michael Smietana of the University of Mont- 
pellier in France to make catalysts from DNA. 
Although most natural DNA spirals in only one 
direction, it is possible to make an artificial ver- 
sion that twists in the opposite direction. The two researchers and their 
teams make their catalysts by choosing a natural or non-natural helix of 
DNA and then attaching a metal ion inside it. The spiral grooves align 
the reactants so that they fuse with the desired chiral configuration. In 
2015, Arseniyadis and Smietana reported a recyclable DNA-copper 
catalyst’’ that created the correct chiral products as reactants flowed 
past. With endless combinations of base pairs and metal ions, “there’s a 
plethora of parameters that you can fine-tune’, says Arseniyadis. 

Chemists are continuing to push the boundaries of catalysis research. 
Li, for example, is experimenting with housing enzymes inside nano- 
particles” to help them last longer. Others are synthesizing completely 
artificial enzymes’ using techniques from synthetic biology. And 
earlier this year, an international team of researchers reported'® using 
an electric field to catalyse the formation of ring-shaped carbon com- 
pounds. These ideas are starting to constitute entire new research fields 
in which conventionally distinct disciplines overlap — for example, 
combining chemical synthesis and DNA. That, says Arseniyadis, leaves 
“alot of room for serendipity”. = 


XiaoZhi Lim is a freelance writer based in Singapore. 
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Antibiotic use in livestock has contributed to drug resistance around the world. 


Use antimicrobials wisely 


The United Nations must reframe action on antimicrobial resistance as the defence of 
a common resource, argue Peter S. Jorgensen, Didier Wernli and colleagues. 


r | The effectiveness of antibiotics has 
been waning since they were intro- 
duced into modern medicine more 

than 70 years ago. Today, our inability to 

treat infections ranks alongside climate 
change as a global threat'”. New classes of 
antimicrobial drugs are unlikely to become 
widely available any time soon’; if and 
when they do, bacteria, viruses and other 
microbes will again evolve resistance’. In 


any case, waging war on microbes is not 
tenable’ — our bodies and planet depend 
on them* (see Supplementary Information; 
go.nature.com/2c03p6n). 

Addressing resistance requires global 
collective action. Like the ozone layer, a 
stable climate or biodiversity, the global pop- 
ulation of susceptible microbes is a common 
pool resource — one shared by all. But no 
individual or country has a strong enough 


incentive to conserve this ‘commons. It has 
been depleted by the massive use of anti- 
microbial compounds and the growing com- 
petitive advantage of resistant microbes. It is 
a classic ‘tragedy of the commons. 

This intimate relationship with micro- 
organisms predates modern humans. It 
is the result of many millions of years of 
co-evolution. Our bodies need particular 
kinds of microbes for digestion, immune 
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> function and general health. Equally, 
microbes support planetary health, for 
example, through nutrient cycles, including 
those that maintain soil and water quality’. 
In other words, microbes sustain human 
civilization. Yet our understanding of the 
complex interactions and uncertainties that 
govern the relationships between humans 
and microbes is limited. 

The 2015 Global Action Plan on Antimicro- 
bial Resistance, drafted by the World Health 
Organization (WHO) with support from 
the United Nations Food and Agricultural 
Organization (FAO) and the World Organisa- 
tion for Animal Health (OIE), recognizes the 
need for multisectoral cooperation to address 
resistance (see go.nature.com/2bbijap). But, 
in our view, it does not go far enough in rec- 
ognizing the life support we receive from 
the global microbiome. Tackling resistance 
urgently requires the scaling back of the 
massive overuse of antibiotics to secure the 
liveability of Earth in the long term. 

On 21 September, heads of state will meet 
to take further action at the United Nations 
high-level meeting on antimicrobial resist- 
ance in New York City. A UN declaration 
currently under discussion must set global 
targets, accelerate implementation of the 
global action plan, plug its gaps and ensure 
stronger accountability and interagency 
coordination. It must emphasize the many 
benefits of microbes. 

Parties should aim to build the resilience of 
society and the microbiome. In our opinion, 
this is the way to maintain low levels of resist- 
ance amid the many surprises of a rapidly 
changing planet. Advances from studying 
resilience in other common pool resources 
such as fisheries and forests” suggest key steps 
for antimicrobial resistance, which we set out 
below. Achieving these will require changes 
to institutions, regulations, education, com- 
munity norms and expectations, notably in 
medicine and agriculture. 


EDUCATE TO LEARN 

Until now, political and financial invest- 
ments have focused largely on creating 
incentives to fuel drug innovation and new 
or faster diagnostics. Currently, such tech- 
nological fixes appeal to and benefit mainly 
rich nations in the ‘global north’ Incentives 
must be targeted to benefit not only large 
pharmaceutical companies in the north, 
but also to enlist research and development 
efforts globally. One of the most important 
outcomes of the UN meeting should be 
national commitments to the broadest and 
most creative participatory education cam- 
paigns about resistance’ and the importance 
of the microbial world. 

Why? Because the level of ignorance about 
the calamity that is antimicrobial resist- 
ance is staggering. A 2015 WHO survey 
across 12 countries found that 64% of the 
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Limited access to quality antimicrobials in the developing world drives unregulated sales. 


public think that antibiotics also work for, 
for instance, viral infections such as influ- 
enza and colds (see go.nature.com/2c7zvfu). 
Such basic knowledge gaps lead patients and 
physicians to reach for antibiotics without 
appreciating the costs. 

Instead, institutions and citizens must 
understand the central facts, context and 
risks in a way that allows them to learn 
more independently. This goal requires 
awareness campaigns to be revised and 
scaled up by orders of magnitude’, as well 
as investment in new communication tools. 
Initiated in 2007, Thailand’s Antibiotics 
Smart Use project sets a direction for 
upscaling. It enables patients in pharmacies 
to self-diagnose on the basis of the appear- 
ance of their sore throat to verify whether 
they need antibiotic treatment”. For further 
learning, citizen-science programmes 
in which participants monitor their own 
microbiomes should be extended to cover, 
for example, self-testing for resistance in 
various parts of the body’. 

Such campaigns could engage commu- 
nities and change norms about how and 
when to use antibiotics. Campaigns will 
need to be coordinated internationally for 
quality and impact, and adapted to suit 
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regional perspectives. Engagement can be 
spread through schools, mass media and 
social media. 


JOIN UP 
Resistance affects animal and environ- 
mental health as well as human health, 
and so requires coordinated action across 
economic sectors. No single concern 
exemplifies this better than the high rate 
of antibiotic use in agriculture (largely as 
growth promoters or disease prevention). 
In the United States, 70-80% of all anti- 
microbials consumed are given to livestock; 
agricultural use in the BRICS emerging 
economies (Brazil, Russia, India, China 
and South Africa) is expected to double by 
2030, as compared to 2010 levels® (see ‘Farm 
forecast’). As a result, antibiotics and resist- 
ance genes enter the food chain, soil and the 
water table, threatening human health. 
The European Union has phased out the 
use of medically important antibiotics for 
growth promotion in agriculture. Other 
countries, including Mexico and Taiwan’, 
have sought to reduce it. In the United States, 
a directive discourages the use of antibiotics 
for growth promotion through voluntary 
measures and stronger veterinary oversight 
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of therapeutic use. However, the powerful 
industrial farming lobby and a lack of per- 
ceived urgency have so far stalled stronger 
mandates. 

Stronger political action to change how we 
use antibiotics, whether by humans or ani- 
mals, requires citizens to be better informed. 
For instance, the public should have online 
access to surveillance that tracks how human 
resistance increases in settlements near farms. 
In the meantime, consumer groups playa cru- 
cial part by calling on retail chains to switch 
where their meat is sourced. For example, 
US food chains Chipotle, McDonald's and 
Chick-fil-A have responded (to varying 
degrees) to public demands with stricter lim- 
its on antibiotic use in the meat they sell. 

A particularly worrying issue that is not 
confined to the use of antimicrobials in food 
production is the international spread of 
resistance genes, especially those conferring 
resistance to many drugs of ‘last resort. Most 
recently, a mobile plasmid gene carrying 
resistance to the last-resort antibiotic colistin 
has been found in Asia, Europe and North 
America. Clearly, countries cannot act alone 
to deal with the problem without jeopard- 
izing the benefits of globalization. 

Much better surveillance and contain- 
ment is needed of the most dangerous 
multiresistant strains in people and food’. A 
global routine-surveillance initiative could 
help to prevent the spread of resistance. It 
could screen medical tourists or patients 
returning from hospitals abroad to identify 
carriers of multiple resistant strains. Hospi- 
tals that are centres of international travel 
for medical treatment must lead the way; 
funding and learning mechanisms must be 
increased for other hospitals to follow suit. 

The International Health Regulations, 
revised by WHO member states in 2005, 
are a legally binding instrument that aims 
to provide global surveillance and response. 
Properly financed, they could be effec- 
tive'®. Yet the resources needed to respond 
to emerging diseases do not flow com- 
mensurately to low- and middle-income 
countries as they do in the global north — a 
key lesson of the recent Ebola outbreak. All 
governments have a collective responsibility 
to improve capacities for rapid response to 
resistance. Greater support by donor coun- 
tries to new and existing funding mecha- 
nisms such as the Global Fund to Fight 
AIDS, Tuberculosis and Malaria is needed 
in low- and middle-income countries. 


EXTEND COALITIONS 

International and national coalitions must be 
broadened. The global action plan strength- 
ens the established collaboration between 
the WHO, FAO and OIE. This should be 
extended to cover other relevant sectors, 
including trade, development and environ- 
ment. The model set up by UNAIDS (the Joint 


United Nations Programme on HIV/AIDS) 
in 1996 serves as an example of how to inten- 
sify collaboration, leverage resources, involve 
more parties and reduce barriers. 

The UN meeting must commit to driving 
learning between institutions. Global plat- 
forms are needed for sharing best practices 
and the latest data about resistance levels and 
antibiotic consumption, for instance, among 
national agencies. Such exchange happens 
in Europe for resistant human bloodstream 
infections, and human and veterinary anti- 
microbial consumption. This must be scaled 
up to monitor resistance in communities, 
food industry and the environment. A rel- 
evant model for exchange at the global level 
is the WHO's Pandemic Influenza Prepar- 
edness Framework. To engage the public 
effectively, more-frequent updating, vivid 
visualizations and engaging communica- 
tions are needed. 

As in the Paris climate agreement, 
countries should submit to the UN voluntary 
but monitored targets on limiting resistance. 
Parties may go further by making shortfalls 
subject to potential sanctions. A key prior- 
ity is to establish measurable indicators at 
the country level, such as the median yearly 
consumption of antibiotics per person. 

As for the climate issue, non-state actors 
from business to civil society can be central 
to societal transformations. Such stakehold- 
ers were consulted during the development 
of the WHO global action plan. But their 
participation in the long run must become 
more integral to the global coalition respon- 
sible for tackling resistance. 

Available governance instruments range 
from binding treaties to guidelines, with 
each approach having pros and cons. A 
first step to holding companies account- 
able would be an international code on 


FARM FORECAST 


By 2030, the use of antimicrobials in 
agriculture in Asia alone could equal 82% of 
global agricultural consumption in 2010. The 
drugs are largely given to promote growth or 
prevent infections, rather than to treat disease. 
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the promotion of antibiotics (promotional 
spending in the United States in 1998 
amounted to US$1.6 billion), akin to that 
adopted by the WHO in 1981 on the mar- 
keting of breast-milk substitutes. 


ACT NOW 
The complexity and gravity of resistance 
call for the immediate mass mobilization 
of society. Maintaining the susceptibility of 
microbes to drugs for global health is a mat- 
ter of sustainable development. Improving 
understanding about humankind’s depend- 
ence on the global microbiome should lead 
to action on many other important issues 
involving micro- 


“Building global organisms. These 
resilience to issues include 
resistance 1s a infectious diseases, 


long game.” food security, nat- 
ural resources and 
environmental conservation. Action here 
could, in turn, lead to more-equitable forms 
of national progress across the sustainable- 
development goals’. 

Building global resilience to resistance is a 
long game. But changes can be surprisingly 
fast when the time is ripe and a plan is ready. 
This month’s UN high-level meeting is a rare 
opportunity for global collective action on 
human interactions with microbes. It must 
protect both the lifesaving power of anti- 
biotics and the ability to use them when 
necessary. m 
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Wells in 1931, about to leave London for a tour of the United States. 


SCIENCE JOURNALS 


The worlds of H. G. Wells 


Simon J. James looks back at the richly varied contribution of the science-fiction 
writer and science popularizer. 


erbert George Wells (1866-1946) 
H occupies a singular place in science 

and culture. Practically reinvent- 
ing science fiction in landmark books such 
as The War of the Worlds, he also wrote 
prolifically on science, education, history 
and politics: in a career spanning 6 dec- 
ades, he penned more than 150 books and 
pamphlets, as well as numerous articles 
in, and letters to, the press. Living through 
the late-nineteenth-century burgeoning of 
the sciences, the societal and technological 
upheavals of the early twentieth century and 
two world wars, Wells both absorbed rev- 
elations and delivered some — foreseeing 


powered flight, space travel, tanks and the 
atomic bomb, and becoming an enthusiastic 
and committed popularizer of science. 
Behind Wells's enormous output was a 
desire to use writing to make the world better 
— by projecting either a utopian vision ofa 
perfected future, or dystopias revealing how 
the lessons of his work went unheeded. 
Among his extraordinary achievements, 
Wells was one of the earliest major English 
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writers to be a trained scientist. The word 
‘scientist’ had been coined by historian 
William Whewell just 33 years before Wells's 
birth. Wells — the child of servants-turned- 
shopkeepers — escaped apprenticeships in 
drapers’ shops to become a pupil-teacher 
at Midhurst Grammar School in the south 
of England. A scholarship propelled him to 
what is now Imperial College London, where 
he studied biology under champion of Dar- 
winism T. H. Huxley, graduating in 1890. He 
never practised as a scientist; nor did he see 
himself as an ‘artist, preferring ‘journalist, 
particularly later in his career, when politics 
became more important in his writing. 
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Wells's brilliance as a communicator of 
science drew him to many friendships with 
scientists — not least Richard Gregory. The 
astronomer, who was at university with Wells, 
was Nature’s second editor. Wells was to pub- 
lish 25 pieces in the journal over 50 years, 
inspiring and provoking scores of contempo- 
rary thinkers into contributing a rolling tide 
of correspondence, book reviews, notices and 
other commentary on his output. 

Wells was also publishing inspired books 
at a furious pace. His first were the scientific 
textbooks Honours Physiography and Text- 
book of Biology (both 1893); the latter went 
into many editions. The topics rapidly rami- 
fied. The year 1895 alone saw a short-story 
collection (The Stolen Bacillus and Other 
Incidents), a fantastic 
romance in which an “Wells was 
angel falls to Earth driven by the 
(The Wonderful Visit) conviction that 
and a volume of educationwas 
essays, as well as his paramount to 
first full-length work clear thinking 
of fiction, The Time and efficient, 
Machine. That book, happy lives.” 
with Wells’s other 
late-1890s ‘scientific romances’ The Island 
of Doctor Moreau, The War of the Worlds 
and The Invisible Man, would set the bar for 
science fiction. They are also among a num- 
ber of books by Wells that had an impact on 
science itself. 

The War of the Worlds inspired Robert 
Goddard — inventor of the liquid-fuelled 
rocket, whose research led to NASA’ Apollo 
programme — to devote his life to space 
travel. The book’s “heat-rays” also presaged 
military lasers. The hero of The Island of Doc- 
tor Moreau, Edward Prendick, “had spent 
some years at the Royal College of Science, 
and had done some researches in biology 
under Huxley”; the book’s animal-human 
hybrids are rough precursors to today’s 
embryonic chimaeras. Wells's 1914 The World 
Set Free predicted the atomic bomb, drawing 
on and subsequently influencing chemist 
Frederick Soddy’s work on radioactivity, and 
influencing physicist Leo Szilard in his work 
on the neutron chain reaction. The Shape of 
Things to Come (1933) foreshadows the Sec- 
ond World War, and its 1936 film adaptation 
Things to Come (produced by Alexander 
Korda and starring Raymond Massey) ends 
with humanity launching its first spacecraft. 

Wells was irritated by comparisons to fel- 
low science-fiction giant Jules Verne. The feel- 
ing was mutual. Verne complained that the 
antigravity metal cavorite in Wells's The First 
Men in the Moon (1901) was pure invention, 
compared to the gunpowder-fuelled rocket 
in his own 1865 From the Earth to the Moon. 
But Wells's main interest was never technol- 
ogy. After inventing the insectoid bodies of 
the Selenites in The First Men in the Moon, 
or the mind-reading aliens of 1937's The > 
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An illustration for The War of the Worlds drawn by Henrique Alvim Corréa (top) and a still from the 1936 
film adaptation of The Shape of Things to Come. 
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> Camford Visitation, he went on to imag- 
ine the significance of these fantastic elements 
for human psychology and culture, setting a 
template that has since been followed by the 
most literary of science fiction (from the likes 
of Margaret Atwood and China Miéville). 

Wells was also honing his journalistic 
skills. His first essay in Nature, ‘Popularising 
Science’ (Nature 


50, 300-301; 1894), “For Wells, the 
asks for standards scientific method 
to be set in popular conferred on 
scientific writing to its user the 
promote accessibil- authority to 
ity. He would go on a 
. rethink and 

to publish Nature 

: challenge stale 
articlesonarange 7 

ideas. 


of subjects (see 
John S. Partington’s 
admirable and comprehensive H. G. Wells 
in Nature, 1893-1946; Peter Lang, 2008). 
But education, more than fiction, science 
or indeed science fiction, was to become the 
keynote of Wells's writing career. 

Owing, in part, to his own escape from 
apprenticeship into an intellectual life, Wells 
was driven by the conviction that education 
was paramount to clear thinking and effi- 
cient, happy lives. Even his most fantastic, 
futuristic writings contained lessons for the 
present, intended to lead to a more utopian 
ordering of the world. A lecture to the Royal 
Institution of Great Britain, published as 
‘The Discovery of the Future’ (Nature 65, 
326-331; 1902), offers a window on the 
development of these ideas, arguing for the 
importance of conscious forward-thinking: 


We travel on roads so narrow that they suf- 
focate our traffic; we live in uncomfortable, 
inconvenient, life-wasting houses out ofa love 
of familiar shapes and familiar customs and 
a dread of strangeness; all our public affairs 
are cramped by local boundaries impossibly 
restricted and small. Our clothing, our habits 
of speech, our spelling, our weights and meas- 
ures, our coinage, our religious and political 
theories, all witness to the binding power of 
the past upon our minds. 


For Wells, the scientific method conferred 
on its user the authority to rethink and chal- 
lenge these stale ideas, and should underpin 
every area of human endeavour. (This posi- 
tivistic idea of science was fairly short-lived, 
lasting only from Charles Darwin's dethron- 
ing of humanity as the summit of creation to 
the early-twentieth-century advent of quan- 
tum mechanics, which undermined claims 
of absolute scientific certainty.) But Britain's 
educational system failed to enshrine science 
properly, Wells felt; the privileged status of 
classics was a consistent target of his ire. The 
result was global woe: “to defective educa- 
tion was due the general neglect of science 
and ‘muddling through’? as he told the 11th 


annual meeting of the British Science Guild 
(Nature 99, 186-187; 1917). His hope was 
that, if the intellectual enquirer were armed 
with the right kinds of knowledge, history 
might be predicted like the movements of 
planets and tides. Then, informed by the 
knowledge of humanity’s shared evolution- 
ary origins, the history of the future would see 
nation states dissolving in favour of a system 
of cooperative world government. 

Wells’ significance over most of his career 
rested on his status as a public intellectual, 
and he relished the international audience 
reached by his publications. His prescience 
was a vital element of his popularity, and not 
just in science fiction. For instance, he imag- 
ined something like a World State-sponsored 
Wikipedia. In an address to the Royal Institu- 
tion in 1936 on the “World Encyclopaedia” or 
“World Brain’; he described it as: 


the mental background of every intelligent 
man in the world. It should be alive and grow- 


ing and changing continually, under revision, 


Wells recording for the BBC (top) and during his 
biology studies at university. 
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extension and replacement from the original 
thinkers in the world everywhere. Every uni- 
versity and research institution should feed 
it. Every fresh mind should be brought into 
contact with its standing editorial organiza- 
tion ...its contents would be the standard 
source of material for the instructional side of 
school and college work, for the verification 
of facts and the testing of statements — every- 
where in the world. 


World Brain (1938) amplified these ideas. 
This book, with the 1920 The Outline of His- 
tory — a best-selling opus on the story of 
humanity from its evolutionary origins to his 
hoped-for utopia — was Wells's response to 
the catastrophe of the First World War. 

Wells lived to see the catastrophe of the 
second. Having witnessed such a failure to act 
collectively, his final contribution to Nature, 
in 1944, was an attempt to understand the 
actions and motivations of the individual. 
‘The Illusion of Personality’ suggests that the 
notion of a stable personality is an illusion, 
because consciousness constantly flits from 
one moment to the next (Nature 153, 395- 
397; 1944). Reading the piece now, it is fasci- 
nating to see a writer so long concerned with 
thinking on a global scale, and over hundreds 
to thousands of years, preoccupied at the end 
of his career with the micro-impressions of a 
single, impermanent sensibility. 

Wells knew, and argued with, most of the 
significant writers and political leaders of the 
late nineteenth- and early twentieth-centu- 
ries. Two friendships were constant: one with 
fellow novelist Arnold Bennett, the other with 
Gregory. Before he became editor of Nature, 
Gregory had co-authored Honours Physiog- 
raphy with Wells; he was an assistant editor 
at the journal when Wells, a then-unknown 
teacher and jobbing science writer, published 
‘Popularising Science. Gregory advised Wells 
on lunar gravity for The First Men in the 
Moon; and when Wells died in 1946, Greg- 
ory wrote the Nature obituary of the genius 
with whom he had first collaborated 50 years 
before (Nature 158, 399-402; 1946). Gregory's 
review of The War of the Worlds (Nature 57, 
339-340; 1898) had ventured that “scientific 
romances are not without a value in further- 
ing scientific interests; they attract attention to 
work that is being done in the realm of natural 
knowledge, and so create sympathy with the 
aims and observations of men of science”. To 
attract attention and create such sympathy 
was Wells's steadfast aim. m 
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The original crew of the USS Enterprise. 
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Boldly going for 50 years 


Sidney Perkowitz scans the impacts of Star Trek on science, technology and society. 


alfa century ago, in September 1966, 
H the first episode of Star Trek aired 

on the US television network NBC. 
NASA was still three years short of land- 
ing people on the Moon, yet the innovative 
series was soon zipping viewers light years 
beyond the Solar System every week. After 
a few hiccups it gained cult status, along 
with the inimitable crew of the starship USS 
Enterprise, led by Captain James T. Kirk 
(William Shatner). It went into syndication 
and spawned 6 television series up to 2005; 
there are now also 13 feature films, with Star 
Trek Beyond debuting in July this year. 

Part of Star Trek’s enduring magic is its 
winning mix of twenty-third-century tech- 
nology and the recognizable diversity and 
complexity enshrined in the beings — human 
and otherwise — created by the show’s origi- 
nator Gene Roddenberry and his writers. 
As Roddenberry put it, “We stress human- 
ity.’ The series wore its ethics on its sleeve 
at a time when the Vietnam War was rag- 
ing and anti-war protests were proliferating, 
along with racial tensions that culminated in 
major US urban riots in 1967-68. Rodden- 
berry’s United Federation of Planets, a kind 
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of galactic United Nations, is an advanced 
society wielding advanced technology, and 
the non-militaristic aims of the Enterprise are 
intoned at the beginning of every episode in 
the original series (TOS): “To explore strange 
new worlds; to seek out new life and new civi- 
lizations; to boldly go where no man [later, ‘no 
one’ has gone before.” 

Over the decades, Star Trek technologies 
have fired the imaginations of physicists, 
engineers and roboticists. Perhaps the most 
intriguing innovation is the warp drive, 
the propulsion system that surrounds the 
Enterprise with a bubble of distorted space- 
time and moves the craft faster than light to 
traverse light years in days or weeks. In 1994, 
theoretical physicist Miguel Alcubierre 
showed that such a bubble is possible within 
Albert Einstein's general theory of relativ- 
ity, but would demand massive amounts of 
negative energy, also known as exotic mat- 
ter (M. Alcubierre Class. Quantum Grav. 11, 
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L73; 1994). This is not known to exist except 
(possibly) in minuscule quantities; and some 
physicists speculate that the Alcubierre drive 
might annihilate the destined star system. The 
warp drive remains imaginary — for now. 

However, another application of warped 
space-time in the series has been realized: a 
cloaking device that shields spacecraft from 
view by bending light around them. In 2006, 
electrical engineers David Smith and David 
Schurig built a ‘metamaterial’ electromagnetic 
cloak that hid an object from microwaves by 
refracting them to pass around it, much as 
water flows around an obstacle (D. Schurig 
et al. Science 314, 977-980; 2006). Now, simi- 
lar diversionary tactics are being used to hide 
small objects under visible light, for instance 
by electrical engineer Xingjie Ni and his col- 
leagues, who devised a “skin cloak” 80 nano- 
metres thick to do the job (X. Niet al. Science 
349, 1310-1314; 2015). 

The exotic Enterprise transporter, which 
instantaneously dematerializes and teleports 
people and things (inspiring the catchphrase 
“Beam me up”), was supposedly conceived to 
save the costs of staging repeated spaceship 
landings. It has a real analogue in quantum 
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teleportation. In 2015, for instance, quantum 
optics researcher Hiroki Takesue and his col- 
leagues harnessed entanglement to send the 
properties of one photon to another over 
100 kilometres of optical fibre (H. Takesue 
et al. Optica 2, 832-835; 2015). Above the 
atomic level, however, we're a long way from 
teleporting entire organisms or objects. 
Other Star Trek technologies anticipated 
modern trends. The tricorder that TOS medic 
Leonard ‘Bones’ McCoy (DeForest Kelley) 
uses for diagnosis has spawned real devices, 
such as SCOUT from medical-technology 
company Scanadu in Moffett Field, Cali- 
fornia. Meanwhile, activity trackers already 
perform basic health monitoring, recording 
pulse rate, calorie intake and quality of sleep. 
Artificial intelligence has begun to emerge 
in technologies such as speech recognition 
by Apple's personal-assistant program Siri, 
Google's self-driving car and the ‘all-terrain’ 
Atlas robot created for the US Defense 
Advanced Research Projects Agency. Allare 
significant developments that could pave the 
way to an eventual approximation of Lieu- 
tenant Commander Data (Brent Spiner), the 
sentient android who debuted on television 
series The Next Generation in the late 1980s. 
Star Trek’s holodeck — the immersive 
virtual-reality environment in which the 
Enterprise crew visits simulated locales — 
is also years away, but huge advances in the 
technology are afoot. The Oculus Rift head- 
set, for instance, provides a visual and audi- 
tory virtual-reality experience, but must be 


tethered to a computer, thus falling short of 
delivering the seamless holodeck experience. 

Three-dimensional printers, which lay 
down successive layers of material to form 
intricate shapes, are now being adapted to 
handle food, perhaps a step towards Enter- 
prise meal replicators. The Creative Machines 
Lab, then at Cornell University in Ithaca, New 
York, designed one model as part of its open- 

access Fab@Home 
project, and Natu- 
ral Machines in 
Barcelona, Spain, 
touts its Foodini 
printer as simplify- 
ing the making of 
textured or layered foods such as ravioli. 

More generally, and arguably with greater 
long-term significance, Star Trek raised 
enthusiasm for space exploration and sci- 
ence. In 1975, fans convinced NASA to name 
its first test space shuttle orbiter Enterprise 
(the craft was unpowered and never reached 
space). And many young would-be scientists 
have found the series inspirational. 

Its social message has been no less impor- 
tant. The federation ethic ensured that Kirk, 
Next Generation Captain Jean-Luc Picard 
(Patrick Stewart) and their successors ‘waged 
peace’ even when confronted by aliens such 
as the Klingons, a people genetically pre- 
disposed to hostility. The February 1968 
episode ‘A Private Little War’ an allegory 
about Vietnam, was a pointed example. Rod- 
denberry believed that humanity must learn 
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The series’ futuristic technologies have 
inspired real-life innovations — some further 
advanced than others. A version of the warp 
drive that propelled the USS Enterprise faster 
than light (1) was proposed by physicist Miguel 
Alcubierre in 1994 (2), but remains conceptual. 
The diagnostic tricorder (3) has been realized 
in Scanadu’s SCOUT (4) and app, which 
measure vital signs such as blood pressure. 


to delight in difference, even in alien life- 
forms, and ready itself to “meet the diversity 
that is almost certainly out there”. 

Star Trek’s portrayal ofhuman diversity and 
refusal to engage in national exceptionalism 
remain landmark achievements. Emerging 
at a time of racial exclusion in US television, 
TOS crew included Lieutenant Nyota Uhura 
(Nichelle Nichols), the first prominent Afri- 
can American female role in a US television 
series, as well as the ‘pan-Asiar’ helmsman 
Hikaru Sulu (George Takei), Russian navi- 
gator Pavel Chekov (Walter Koenig) — and, 
of course, Leonard Nimoy’s star turn as half- 
Vulcan Commander Spock. Native Ameri- 
can first officer Chakotay (Robert Beltran) 
emerged in the series Voyager (1995-2001). 
The gender balance tended to the heavily 
male until the advent of Voyager Captain 
Kathryn Janeway (Kate Mulgrew), with half- 
Klingon chief engineer B’Elanna Torres (His- 
panic actress Roxann Dawson). Real-world 
impacts abound. Nichols, for instance, has 
related how US civil-rights leader Martin 
Luther King urged her to remain in the series 
when she was considering other professional 
options. Her character, in turn, inspired astro- 
naut Mae Jemison, the first African American 
woman to be sent into space by NASA. 

Fifty years later, how does our world 
compare with Roddenberry’s universe? 
The changes in technology are transfor- 
mational; and although interstellar travel 
has yet to become reality, NASAs projected 
2030s human mission to Mars follows the 
dream “to boldly go” The progressive social 
values that Star Trek pioneered on television 
are now much more widely held. But new 
conflicts and geopolitical stand-offs have 
erupted, despite efforts by our own federa- 
tion, the United Nations. Amid these shifts 
and tensions, this vastly influential fran- 
chise continues to carry a subtle but clear 
message— we can be better than we are. m 
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Report released on 
antibiotic resistance 


The Wellcome Trust today 
releases a report to inform 

the United Nations General 
Assembly’s High-level Meeting 
on Antimicrobial Resistance later 
this month (see www.wellcome. 
ac.uk/drugresistantinfections). 
The report distils the findings 

of an international summit of 
researchers, policymakers and 
multilateral institutions that met 
in London in April 2016. 

It identifies three areas for 
immediate action to alleviate 
the current and future impact of 
drug-resistant infections on the 
number of deaths and on national 
economies. The summit and 
report build on the independent 
review on antimicrobial 
resistance led by economist Jim 
O’Neill and commissioned by the 
UK government, in partnership 
with the Wellcome Trust, which 
was published in May 2016 (see 
go.nature.com/2bsxoyi). Together, 
these reports should help to focus 
attention and galvanize support 
from national governments, 
the G7 and G20 countries, 
international agencies and non- 
governmental organizations. 

The UN resolution on 
antimicrobial resistance should 
commit governments and 
international organizations 
to concerted and verifiable 
action, adapted locally as 
necessary. Continued support 
for scientific research and 
innovation is essential to shape 
future responses, but the need 
for further research must not be 
an excuse for delaying urgent 
interventions. 

Jeremy Farrar Wellcome Trust, 
London. 

Sally Davies Department of 
Health, London. 
j.farrar@wellcome.ac.uk 


Antibiotic partners 
promote discovery 
As president of the Infectious 


Diseases Society of America 
and a physician of infectious 


diseases, Iam greatly encouraged 
by the launch of the Combating 
Antibiotic Resistant Bacteria 
Biopharmaceutical Accelerator 
(CARB-X; see Nature http:// 
doi.org/bp7x; 2016). Contrary 

to your implication, this 
public-private partnership is 
designed to foster antibiotic 
discovery as well as preclinical 
antibiotic development (see 
www.carb-x.org). Physicians who 
treat the increasing numbers of 
people with infections caused by 
multidrug-resistant organisms 
know at first hand the urgent 
need for novel antibiotics. 

Most pharmaceutical 
companies have been retreating 
from antibiotic research and 
development (R&D) over 
the past few decades because 
of economic, regulatory 
and scientific hurdles. Fresh 
incentives are needed to 
stimulate and support all 
stages of antibiotic R&D if new 
drugs are to be discovered and 
brought to market in a timely 
fashion. CARB-X can play an 
important part in this broader 
effort, which must also include 
other economic and regulatory 
incentives that are currently 
under consideration in the 
US Congress. 

Johan S. Bakken St. Lukes 
Medical Center, Duluth, 
Minnesota, USA. 
jbakken1@d.umn.edu 


Brexit threatens 
China collaboration 


Brexit — Britain's exit from the 
European Union — threatens 
to undermine the country’s 
scientific relationships with 
nations outside the EU (see 
E. Masood Nature 535, 467; 
2016). The country will need 
to invest more to maintain its 
valuable collaboration with 
China, for example, once EU 
funding is withdrawn. 

The United Kingdom could be 
excluded from exchanges under 
the EU’s Marie Sktodowska- 
Curie fellowship programme, 
which have benefited thousands 


of talented Chinese and British 
scientists since 2007. For the 
country to retain its current 
exchange level of international 
research talent, it would need to 
invest more in its Newton Fund 
to make up the shortfall (see 
go.nature.com/2bfgzq3). 

Also under threat will be 
British scientists’ participation 
in China’s projects with the EU, 
such as its Five-hundred-meter 
Aperture Spherical Telescope, 
due to be completed this year 
(see W. Yang Nature 534, 
467-469; 2016). 

Despite such uncertainties, 
Brexit could still provide 
opportunities to strengthen 
scientific collaboration between 
the world’s second and fifth 
largest economies — for example, 
through the collaboration 
between Research Councils 
UK and the National Natural 
Science Foundation of China (see 
go.nature.com/2bflysi). 

Hong Yang Norwegian Institute 
of Bioeconomy Research; and 
University of Oslo, Norway. 
Roger J. Flower University 
College London, UK. 

Xianjin Huang Nanjing 
University, China. 
hongyanghy@gmail.com 


Gap widens for 
honorary PhDs 


The changing nature of the 
standard PhD degree (Nature 
535, 26-28; 2016) could 
make the honorary PhD 
seem increasingly hollow by 
comparison. 

Universities confer honorary 
doctorates on those who have 
attained national or international 
prominence in the arts, sciences 
or sporting fields. Scholarly skills 
are rarely considered, although 
most recipients have sufficient 
expert knowledge to potentially 
write a thesis. Recipients often 
have links with the awarding 
institute, which benefits from the 
associated publicity. 

By contrast, the standard 
PhD is awarded in recognition 
of research expertise. Now 


that more PhD graduates 
pursue careers outside 
academia, programmes are 
placing greater emphasis on 
transferable scholarly skills and 
on developing management, 
entrepreneurship and teamwork 
skills. These additional training 
requirements threaten to stoke 
academic tensions over the 
existing gap in scholarship 
between standard and honorary 
PhDs. 

Steven Watterson University of 
Ulster, Londonderry, UK. 
s.watterson@ulster.ac.uk. 


Bite like a spider, 
sting like a scorpion 


The image of a togo starburst 
tarantula (Heteroscodra 
maculata) on your Contents page 
in the print issue (Nature 534, 
433; 2016) is incorrectly titled 
‘Sting like a spider’ Spiders do 
not sting, they bite. 

Arthropod bites and stings 
are differentiated by the nature 
and purpose of the stinging or 
biting apparatus, and by their 
clinical effects (see J. Goddard 
Physician’s Guide to Arthropods 
of Medical Importance; CRC 
Press, 2012). Bees and scorpions, 
for example, inject their stings 
mainly for defence; spiders bite 
usually to immobilize or kill 
their prey, by injecting venom 
from their fangs. These insults 
typically result in one or two 
puncture marks, respectively, 
in the victim’s skin, serving as 
useful indicators for diagnosis 
and treatment. 

Reza Afshari BC Centre for 
Disease Control, Vancouver, 
Canada. 
reza.afshari@bccdc.ca 
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CALTECH 


OBITUARY 


Ahmed Hassan Zewail 


(1946-2016) 


Nobel-winning inventor of femtochemistry and statesman. 


r hat the first science Nobel 
prizewinner from the 
Arabic-speaking world, 

Ahmed Hassan Zewail, pioneer 

of ultrafast chemistry, was also 

a diplomat is apparent in his 

unique list of distinctions. Few 

scientists can have been gar- 
landed by foundations in both 

Israel and Saudi Arabia, served 

in the Pontifical Academy of 

Sciences — and had their face 

on several postage stamps while 

living. He died on 2 August 2016, 

aged 70. 

The eldest child of a middle- 
class family, Zewail was born on 
26 February 1946. He grew up 
in Desouk, Egypt, a small town 
80 kilometres from Alexandria. 
After a state-school education, 
he took undergraduate and masters degrees 
in chemistry at the University of Alexan- 
dria. He then decided to further his studies 
in the United States, despite a fairly weak 
command of English (a fact which shocks 
those of us who knew this eloquent speaker 
later on). 

He did his graduate work on novel 
spectroscopies, including optically 
detected magnetic resonance, with Robin 
Hochstrasser at the University of Penn- 
sylvania in Philadelphia. His postdoctoral 
work was on coherence in multidimen- 
sional systems and energy transfer in solids, 
with Charles B. Harris at the University of 
California, Berkeley. In 1976 he joined the 
California Institute of Technology in Pasa- 
dena, where he remained for the rest of his 
career, rising to become the Linus Pauling 
professor of chemistry in 1995. Like Pauling, 
his reputation and impact would truly 
transcend science. 

Throughout the 1980s and most of the 
1990s he led his group to do experiments on 
‘femtochemistry’ — his coinage for causing 
and watching reactions using light pulses 
lasting much less than a picosecond (a mil- 
lionth of a millionth of a second). This is the 
timescale of chemical reactions at the molec- 
ular level — the timescale of vibrations and 
nuclear motions. For this work he became 
the sole recipient of the 1999 Nobel Prize in 
Chemistry. Before the advent of such ultra- 
fast lasers in the 1970s, chemists’ ideas of the 
dynamics of molecules in excited states were 
very different from todays. They believed 


that the dominant force was intramolecular 
relaxation, and that this was largely 
incoherent. 

Zewail’s work shattered this picture. 
Through elegant experiments, his group 
unravelled reaction dynamics, clarified 
molecular pathways and illuminated the 
quantum-mechanical evolution of atoms 
in molecules. The classic tool for him was 
the pump-probe experiment. Here the first 
pulse (the pump) started a chemical reac- 
tion, and the second (the probe) monitored 
what happened next. In this way he and his 
team took snapshots of vibrational flow, 
state rearrangement and reaction prod- 
ucts. These revealed a much deeper role for 
coherence than anyone had anticipated. 

After his Nobel Prize, Zewail’s focus 
shifted towards a new form of microscopy 
that used ultrafast pulses of electrons to track 
reactions in space and time at the atomic 
scale. It was no secret that he hoped (like 
Pauling) to win a second Nobel Prize: Zewail 
was not one to rest on his laurels. Once again, 
truly elegant science resulted. 

Generations of talented students benefited 
greatly from his insight. He longed to boil 
down complex phenomena to the simplest 
underlying dynamics. He would urge his co- 
workers to avoid getting mired in detail. A 
very, very busy man, he still kept close tabs 
on his sub-basement labs. He would pop 
in during an experiment without warning, 
poke around, and start asking questions, 
saying he just wanted to “smell the cooking”. 

His career and influence were shaped by 
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the view that science transcends 
political borders. One early inci- 
dent illustrates this. In January 
1983, he organized the Inter- 
national Conference on Photo- 
chemistry and Photobiology at 
his alma mater in Alexandria. 
He was clearly a rising star, but the 
work that would lead to the Nobel 
had barely started. The confer- 
ence was a major milestone in 
his career, attracting a stunning 
collection of distinguished inter- 
national scientists, with an obvious 
subtext of bootstrapping progress 
for Egyptian science in general. 

Given the tumultuous politics 
of the region then, as now, it was 
no surprise that as soon as Israeli 
scientists arrived, most of the 
Arab representatives were ordered 
by their governments to leave. Zewail surely 
would have expected this and could have 
taken the easy way out — asking his Israeli 
friends (who were numerous) to stay away. 
Instead he publicly denounced at the con- 
ference the destructive acts of the same 
governments he was trying to help guide to 
modernity. 

Zewail never lost his drive to modernize 
science in the Arabic-speaking world. In 
speeches and articles he reminded his coun- 
trymen of the historical greatness of their 
science, and encouraged them to build to 
greatness again through investment in edu- 
cation and fundamental research. He was 
the driving force behind the Zewail City of 
Science and Technology in October City, 
Giza. After a troubled gestation due to politi- 
cal instability, this new university finally 
opened in 2013, with institutes intended to 
cover all the fields required for development 
of Egyptian society. Zewail was also on US 
president Barack Obama’s Council of Advi- 
sors on Science and Technology for four years 
and served as the US science envoy to the 
Middle East. 

With his death, we have lost a talented 
scientist and true statesman of the world. m 


Warren S. Warren is the chair of the Physics 
Department at Duke University, and James 
B. Duke Professor of Chemistry, Physics, 
Radiology and Biomedical Engineering. He 
was a postdoctoral fellow with Zewail from 
1981 to 1982. 

e-mail: warren.warren@duke.edu 
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DRUG DISCOVERY 


Designing the ideal opioid 


The development of a drug that mimics the pain-relieving activity of opioid compounds, but has fewer side effects, points 
to an effective strategy for the discovery of many types of drug. SEE ARTICLE P.185 


BRIGITTE L. KIEFFER 


pium has been used medicinally 
() and recreationally for more than 

4,000 years because of its remark- 
able pain-relieving and euphoria-inducing 
properties’. Today, abuse of prescription opi- 
oids — morphine and its derivatives — has 
escalated’, and heroin addiction represents 
a worldwide health and societal burden. An 
ideal opioid would kill pain potently without 
producing morphine’s harmful respiratory 
effects, would show sustained efficacy in 
chronic treatments and would not be addic- 
tive. On page 185, Manglik et al.’ describe a 
step towards this perfect drug. 

It has been a long road. It was naively 
thought that identifying receptor proteins for 
morphine would rapidly deliver the ideal opi- 
oid. In the early 1990s, three opioid-receptor 
(OR) genes were isolated that encode the 
G-protein-coupled receptors (GPCRs) mu 
(uOR), delta and kappa’. Genetic disruption 
of wOR in mice revealed that this protein medi- 
ates morphine-induced pain relief, reward and 
dependence all at once’. This discovery, cou- 
pled with the fact that thousands of morphine- 
related drugs had no better pharmacology than 
conventional opioids, dampened enthusiasm 
for developing wOR-targeting drugs. 

The realization that distinct drugs acting at 
a given receptor can trigger diverse signalling 
responses’ has since opened up the possibility 
of designing ‘biased’ opioids that activate sig- 
nalling pathways relevant to therapy, but not 
those that produce unwanted effects. However, 
another breakthrough was required to move 
the field effectively to the next level — the 
development of a method to crystallize these 
rare, unstable membrane proteins. This tech- 
nique has transformed GPCR research, leading 
to resolution of the structure of many proteins’, 
including WOR (ref. 8). Today, the availability 
of these crystal structures allows researchers to 
probe both the active and inactive conforma- 
tions of GPCRs and the ways in which they 
bind their ligands, facilitating structure-based 
drug discovery’. 

As part of this effort, Manglik et al. undertook 
a search for a molecule that would bind to 
wOR. Their goal was to use the power of 
computational docking to find new opioid 
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Figure 1 | New biology for an old receptor. a, The opioid molecule morphine is derived from poppies. 
Morphine binds to the p opioid receptor (OR) protein in the mammalian brain to form an active 


complex with signalling proteins, including G;, 


i/o 


and f-arrestin. The G, 


signalling pathway is 


i/o 


thought to mediate morphine’s pain-relieving properties, whereas B-arrestin signalling results in 
unwanted side effects — euphoria, which can lead to addiction, as well as respiratory depression and 
gastrointestinal effects. b, Manglik et al.” used the crystal structure of }OR to develop a computational 
screening programme. The authors docked 3 million molecules to the wOR binding site, selected the 
most promising candidates and then tested and optimized these to produce the drug PZM21. This 
compound produces highly G,,,-biased signalling, and effectively reduces pain in mice without other 


detectable effects. (Me; methyl.) 


structures (chemotypes), in the hope that some 
might stabilize uOR in as-yet-unexplored 
conformations, show unique, biased signal- 
ling profiles, and perhaps generate previously 
unseen biological effects. 

The authors computationally docked 
3 million commercially available molecules to 
the WOR binding pocket. For each compound, 
more than 1 million configurations were tested 
for complementarity to the binding site, and 
the 2,500 best-fitting molecules were examined 
by eye to identify those with chemotypes unre- 
lated to known opioids. The authors selected 
23 compounds for experimental testing, and 
further docking-testing rounds produced a set 
of molecules that had novel chemotypes, unu- 
sual docking poses in the receptor-binding site, 
and reasonable binding affinities and selectiv- 
ity for wOR. 

Activation of WOR triggers two major sig- 
nalling cascades — those involving G,,, and 


i/o 
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B-arrestin proteins. Manglik and colleagues 
found that, of their 23 molecules, compound 12 
had strongly biased activity for G,,, signalling. 
This is interesting because WOR agonists (acti- 
vators) that poorly engage f-arrestin signalling 
are thought" to confer more-efficient pain 
relief and cause fewer side effects than those 
that strongly activate this pathway. Indeed, a 
drug named TRV 130 that is unrelated to either 
morphine-related drugs or compound 12 has 
been developed on this basis using conven- 
tional drug-screening methods and is currently 
in phase III clinical trials’. In their final opti- 
mization step, the authors used docking infor- 
mation from compound 12 to create a drug 
dubbed PZM21 (Fig. 1). They then compared 
PZM21 with morphine and TRV 130. 

In mice, the pain-relieving efficacy of 
PZM21 was comparable to that of morphine 
and lasted longer. PZM21 reduced pain 
responses mediated by the central nervous 


system, but not those mediated at spine level. 
This activity has not previously been reported 
for a tOR agonist, and potentially has thera- 
peutic value for targeting components of 
pain mediated by the central nervous system. 
The compound induced less constipation 
than morphine and did not modify respira- 
tory activity. Strikingly, mice did not showa 
preference for the testing chamber in which 
they received PZM21 over the one in which they 
received saline, and the compound did not 
induce hyperactivity — signs of addiction-like 
behaviour in mice. 

TRV130 produced effective pain relief in 
all modalities, induced only subtle respira- 
tory depression and caused no significant 
place preference. Thus, despite slightly differ- 
ing effects in vivo, the pain-relieving proper- 
ties of both PZM21 and TRV130 supersede 
the adverse effects classically observed for 
morphine. Manglik and co-workers’ study 
therefore definitively establishes the promise 
of G,,-biased tOR agonists for pain control. 

There is little doubt that structure-based 
computational screening will accelerate the 
pace of drug discovery’. The current work 
provides a compelling example of how this 
technology can efficiently generate chemo- 
types, enable rapid optimization of candidate 
molecules with minimal experimental test- 
ing, and lead to the discovery of molecules 
that have innovative biological activities. 
The open-access docking tools now available 
(such as http://blaster.docking.org) should 
expand the practice of this approach. 

Many challenges lie ahead in ligand-docking 
research. In particular, predicting biased activ- 
ity remains beyond reach, and was nota goal 
of the present study. However, Manglik et al. 
did find that PZM21 and TRV130 adopt dis- 
tinct docking poses in the wOR binding pocket. 
Hence, molecular interactions common to the 
PZM21-yOR and TRV130-WOR complexes 
deserve further attention, because they may 
contribute to selective G,,, activation. 

Whether the in vivo effects of PZM21 reflect 
only G,,,-biased activity remains uncertain. 
Similarities in the pharmacology of PZM21 
and TRV130 argue in favour of common 
modes of action for the two compounds, 
probably stemming from G,,, signalling. On 
the other hand, the authors’ docking analyses 
suggest that the compounds engage ntOR 
amino-acid residues in different ways. The 
drugs also show opposing activities when 
binding kappa opioid receptors in cells, and 
have different pharmacokinetics in vivo. The 
authors did not investigate whether animals 
develop tolerance to PZM21, and other in vivo 
activities of the drug may yet be discovered. 
The common and distinct actions of PZM21 
and TRV130 should be investigated in the 
brains of living organisms, which might reveal 
activities at the level of brain networks. 

In summary, Manglik and colleagues study 
is an impressive demonstration that new 


chemotypes can offer unusual biological 
opportunities, particularly for the study of 
opioids. Are we getting closer to the ideal 
pain-reliever? PZM21 is a leading member ofa 
nascent club of pain-effective wOR agonists 
that seem to have reduced risk for abuse. 
These are not exactly opioids, and structure- 
based discovery approaches should increase 
their number and enhance the chances 
of a successful drug reaching the market 
at last. = 


Brigitte L. Kieffer is at the Douglas Mental 
Health Institute, Department of Psychiatry, 
McGill University, Montreal, 

Quebec H4H 1R3, Canada. 

e-mail: brigitte.kieffer@douglas.mcgill.ca 
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Slippery when narrow 


An experimental technique has been developed to measure water flow through 
carbon nanotubes. Measurements reveal that flow can be almost frictionless, 
posing challenges for computer simulations of nanofluidics. SEE LETTER P.210 


ANGELOS MICHAELIDES 


arbon nanotubes are hollow cylinders 

formed from carbon atoms arranged 

in a hexagonal, graphite-like lattice 
and have nanometre-scale diameters. It has 
been suggested that water transport through 
carbon nanotubes is almost frictionless, and 
that the flow rate exceeds predictions made 
using classical theories by many orders of 
magnitude (see refs 1-3, for example). How- 
ever, because of challenges in performing 


Nanocapillary 


reliable measurements and computer 
simulations, and given the huge differ- 
ences in the reported results, claims of rapid 
water transport have at times been met with 
scepticism (see, for example, ref. 4). On 
page 210, Secchi et al.° help to resolve this 
issue by reporting unambiguous measure- 
ments of water flow through individual 
carbon nanotubes. The unprecedented 
sensitivity of the authors’ measurements 
reveals a strong dependence of water friction 
on the radius of the carbon nanotube: the 


Water reservoir 


Figure 1 | Tracking minuscule water flow. Secchi et al.” measured the flow of water passing through a 
carbon nanotube (not visible) at the tip of a nanocapillary into a water reservoir by observing the motion 
of polystyrene nanoparticles suspended in the reservoir. The trajectories of different nanoparticles are 
indicated; colours correspond to the particle velocity, v (measured in micrometres per second). 
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narrower the tube, the less friction there is. 

Why is water flow through carbon nano- 
tubes of interest? One reason is that increas- 
ingly severe shortages of clean water are on 
the horizon, and so better water-purification 
and desalination technology is needed. Carbon 
nanotubes have generated excitement because 
measurements and computer simulations® 
have revealed that water can travel much more 
rapidly through these tiniest of pipes than, for 
example, salt ions can. Carbon nanotubes 
might therefore enable higher-performance 
filters that are more cost-effective than the 
conventional carbon-based filters currently 
ubiquitous in water-purification devices. 

To explore water flow through nanotubes, 
the authors built nanoscale devices in which 
two reservoirs of water were separated by a 
water-tight membrane pierced by an individ- 
ual nanotube. By raising the pressure on one 
reservoir, water flows through the nanotube to 
the other. The flow is incredibly small: about a 
femtolitre (10-** litres) per second. To put this 
in perspective, 1 fl is less than the amount of 
water in a single human red blood cell. 

Given the minuscule amounts of water 
involved, Secchi and colleagues could not 
track the motion of water itself. Instead, they 
built on previously reported work’ by moni- 
toring how the jets of water emerging from 
the nanotube displaced polystyrene nano- 
particles suspended in the low-pressure water 
reservoir (Fig. 1). Ignoring the differences in 
relative size, this is like counting the number 
of children sliding into a ball pit by watching 
the motion of the balls. The polystyrene nano- 
particles were large enough to be seen with an 
optical microscope, and, by tracking their 
motion, ultrasensitive measurements of water 
flow through the tubes were possible. This 
sensitivity is the key methodological advance 
of the study. 

Using this technique, the authors measured 
flow through carbon nanotubes that had dif- 
ferent radii, and through nanotubes built from 
boron nitride — a technologically promising 
material that forms nanotubes with a similar 
atomic structure to that of carbon nanotubes. 
The key metric commonly used to evaluate 
flow across surfaces and in confinement is 
known as slip length (see ref. 8, for example). 
Essentially, the larger the slip length, the more 
slippery the surface and the less friction is 
exerted on a fluid flowing across it. Slip lengths 
have been measured previously for water flow 
through aligned arrays of carbon nanotubes 
of different radii, but the values obtained 
differed by several orders of magnitude’. 

Secchi and co-workers’ measurements of 
flow through individual nanotubes help to 
reconcile some of the previous measurements 
by revealing a strong dependence of slip length 
on nanotube radius. In addition, the measure- 
ments confirm that carbon surfaces are indeed 
unusually slippery, allowing almost friction- 
less flow through the tubes with the smallest 


radius (approximately 15 nanometres). The 
authors also observed that boron nitride nano- 
tubes are rather sticky compared with carbon 
nanotubes — for the range of radii considered 
(about 10-30 nm), water does not flow any- 
where near as freely through boron nitride 
nanotubes, and is almost at the detection limits 
of the experimental set-up. 

By providing a deeper understanding of 
well-defined aqueous interfaces, these meas- 
urements might aid the design of improved 
membranes and nanofluidic devices. The 
results also create opportunities and challenges 
for computer simulations of fluid motion. For 
example, the quantitative measurements of 
slip length can serve as a benchmark against 
which computer simulations can be verified. 
This is important, because understanding how 
well computers simulate interfacial water is rel- 
evant not just to potential applications such as 
membranes and water desalination, but also to 
fields such as the atmospheric sciences, energy 
production and storage, and catalysis. 

But an explanation is needed for the rela- 
tive stickiness of nanotubes made of boron 
nitride compared with carbon. Only modest 
disparities in the behaviour of water at these 
two materials are expected on the basis of 
their similar structures and from previous 
simulation studies’, including reference- 
quality quantum-mechanical simulations”. 
The huge differences observed by Secchi 
et al. imply that factors such as water dis- 
sociation (the break-up of water molecules 
into their constituent parts), ion adsorp- 
tion to the nanotubes, nanotube defects and 
defect-induced chemistry, or gating effects at 
the ends of the nanotubes might have a role 


CONSERVATION 


in determining water flow. Resolving which 
factors are involved will require further experi- 
ments and high-quality quantum-mechanical 
simulations. 

To extend this work for desalination appli- 
cations, it will be essential to understand the 
connection between water flow and (salt) ion 
motion. More broadly, the authors’ experi- 
mental approach could readily be applied 
to nanofluidics in general, by examining the 
flow of different liquids through different 
materials. If the sensitivity of the technique 
can be improved, then studies of water flow 
through the pores in biological membranes — 
including the most efficient water filter of 
all, the aquaporin protein — should also be 
within reach. m 


Angelos Michaelides is at the Thomas 
Young Centre, Department of Physics and 
Astronomy, and at the London Centre for 
Nanotechnology, University College London, 
London WC1H OAH, UK. 

e-mail: angelos.michaelides@ucl.ac.uk 
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Mapping the terrestrial 
human footprint 


An analysis of direct human impacts across Earth’s land surface using global 
satellite images and ground surveys reveals the scale of the ‘human footprint’ 
on the world and its changes between 1993 and 2009. 


PHILIP J. K. MCGOWAN 


umanity is causing unprecedented 
H changes to Earth, such that we may be 

entering a human-dominated geologi- 
cal era termed the Anthropocene”” and trans- 
gressing the environmental boundaries within 
which we can live safely**. The impact of the 
growing extent and intensity of human influ- 
ences on our landscapes is reflected in changes, 
usually of loss and degradation, in natural 
habitats and in the species that they contain. 
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We need to understand not only where human 
pressures occur, but also where they are great- 
est and how they change over time. Taking 
advantage of the availability of global data sets 
ona range of human pressures across 16 years, 
papers by Venter et al. in Nature Communi- 
cations® and Scientific Data® provide the first 
analysis of this changing ‘human footprint’ 
on the world’s terrestrial landscape. 

Humans exert pressures on the planet ina 
great many ways that may lead directly or indi- 
rectly to changes in natural systems (Fig. 1). 


FROM TOP: NASA EARTH OBSERV.; DIGITALGLOBE/GETTY; NASA EARTH OBSERV. 


The first step in documenting where 
human pressures act across the world 
was taken in 2002 by Sanderson et al.’ 
with the development of a framework 
to map the human footprint using eight 
global data sets of human activities. Con- 
structing such a map presents profound 
challenges because of the complexity of 
our impacts on the planet. Therefore, 
some decisions had to be made about 
what to include when developing their 
map of the world’s human footprint, and 
Venter and colleagues have followed the 
approach taken by Sanderson et al.’. 

Venter et al. confined their analysis 
to the terrestrial landscape because 
assessing the footprint in the marine 
environment would require a dif- 
ferent approach and data sets. They 
concentrated on direct, rather than 
indirect, measures of human influence 
for which data were available. Only 
accessible data sets were used that had 
global coverage and that were easily 
available and of sufficient quality. Ant- 
arctica and many oceanic islands that 
were absent from these global data sets 
were excluded. These decisions sought 
to match current data availability with 
the ambition of developing a global 
framework for assessing human impacts 
on the terrestrial environment. 

The authors have built substantially 
on the work of Sanderson and colleagues 
and have brought it up to date by ana- 
lysing the most recently available com- 
prehensive data sets, and by adding an 
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should be brought to the attention of 
those who make policies and decisions, 
including governments. For example, 
‘pressure-free lands’ are now restricted 
to high northern latitudes, some deserts 
and the most distant parts of the Ama- 
zon and Congo rainforests. The change 
in the footprint over that period varies 
with geography and habitat. Areas such 
as the North American tundra, most 
New Guinea forests and some forests in 
the Neotropics (the tropical part of the 
American continent) showed the biggest 
increases in human impact. 

Although these findings provide food 
for thought, they will also trigger ques- 
tions regarding caveats and qualifiers 
about the data that are available and how 
adequately they reflect human pressures 
on terrestrial ecosystems. This is inevi- 
table when tackling an issue that is so 
complex in lots of different ways. Rather 
than diminishing such work, this should 
spur us on to improve both its concep- 
tual basis and its technical execution, so 
that an even better map of human influ- 
ence across the world’s land masses can 
be developed in the future. 

Earth is being changed substantially, 
and we need ways to both understand 
and communicate how human pres- 
sures on the planet combine. Venter et al. 
have created a framework that will allow 
researchers to track a range of direct 
pressures and, crucially, provide infor- 
mation that could be relevant to those 
who make high-level policy decisions. 


assessment of human footprint changes 
over time. Furthermore, Venter and col- 
leagues provide a service for the future 
by clearly describing all data sets° and 
how they were used. This allows easy 
access to the data and methods so that 


Figure 1 | Human footprints across Earth. Venter et al.*° 
assessed the scale of change of human impacts across the 
globe between 1993 and 2009 by analysing satellite images 
or using ground-survey data. Five aspects of human activity 
were monitored: the presence of built environments; areas of 
population density; navigable waterways (top, satellite image 


However, we do need to add to this 
framework. For example, ecologists 
have no single metric yet for measur- 
ing the influence of hunting across 
terrestrial systems, and, given the huge 


the approach can be developed, and it 
will enable changes in patterns of human 
influence to be assessed in the future 
using data available at the time. 

The heart of Venter and colleagues’ work lies 
in combining data sets on several pressures to 
produce an assessment of how human influ- 
ences accumulate, an approach that the authors 
say is more indicative of the totality of direct 
human pressures than is producing maps of 
single pressures, some of which are easier to 
detect than others. The result is a ‘cumulative 
threat map” and a human footprint that repre- 
sents an accumulation of a range of pressures. 
Venter et al. identified eight data sets repre- 
senting human population density, land trans- 
formation, human transit routes and electrical 
power infrastructure to serve as proxies for this 
footprint. Some data were remotely sensed and 
others collected through ground surveys. 

Three data sets (for pasturelands, roads 
and railways) were not available for the dates 
needed by the authors to make comparisons 


of Venice, Italy); areas of crop growth (middle, vineyards near 
Huelva, Spain); and electrical infrastructure such as artificial 
lights (bottom, Shanghai, China). 


over time and were therefore not used when 
assessing change. This exemplifies the chal- 
lenge of addressing the fundamental question 
of how human influence on the planet’s ter- 
restrial landscapes has changed. Ifa study was 
commissioned to address this from scratch, it 
would not get off the ground because of the 
scale of data collection required. Together, the 
two studies by Venter and colleagues represent 
a pragmatic approach to these challenges. 
The headline findings are that direct impacts 
of human development can be measured in 
75% of the world’s terrestrial systems, and that 
the human footprint increased by 9% between 
1993 and 2009, during which time the human 
population increased by 23% and the global 
economy increased by 153%. The comparison 
of footprints reveals intriguing insights, all of 
which merit further analysis and, potentially, 


pressures from over-exploitation of 
species® and escalating pressures from 
the illegal trade in wildlife’, this would 
be a key step forward. It would be 
fascinating, and probably alarming, to 
see how such a metric might change the human 
footprint map. = 


Philip J. K. McGowan is at the School 

of Biology, Newcastle University, 
Newcastle upon Tyne NE1 7RU, UK. 
e-mail: philip. mcgowan@newcastle.ac.uk 


1. Crutzen, P. J. & Stoermer, E. F. GBP Global Change 
Newsl. 41, 17-18 (2000). 

2. Corlett, R. T. Trends Ecol. Evol. 30, 36-41 (2015). 

3. Rockstrom, J. et a/. Nature 461, 472-475 (2009). 

4. Steffen, W. et al. Science 347, 1259855 (2015). 

5. Venter, O. et al. Nature Commun. http://dx.doi.org/ 
10.1038/ncomms12558 (2016). 

. Venter, O. et al. Sci. Data http://dx.doi.org/10.1038/ 
sdata.2016.67 (2016). 

. Sanderson, E. W. etal. BioScience 52, 891-904 (2002). 

. Maxwell, S. L. Nature 536, 143-145 (2016). 

. United Nations Environment Assembly. Resolution 
2/14 Illegal trade in wildlife and wildlife products. 
Available at go.nature.com/2bzmsdv 


a 


WO ON 


8 SEPTEMBER 2016 | VOL 537 | NATURE | 173 
© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


| RESEARCH | NEWS & VIEWS 


ASTROPHYSICS 


Violent emissions 
of newborn stars 


Interactions between young stars and their parent molecular clouds are poorly 
understood. High-resolution observations of the Orion nebula now reveal these 
interactions, which have implications for star formation. SEE LETTER P.207 


MARKUS ROLLIG 


. tars are not static objects — they form, 


evolve and are then destroyed. In galaxies 

such as the Milky Way, the gas and dust 
between stars, which comprise the interstellar 
medium, accumulate in giant molecular 
clouds. The densest parts of these clouds even- 
tually collapse under their own weight to create 
stars. On page 207, Goicoechea et al. report 
their latest observations of one of the closest 
stellar nurseries, the Orion molecular cloud. 
They find evidence of strong interactions 
between young massive stars and the cloud, 
shedding light on some unknown aspects of 
star formation. 

The general process of star formation 
is fairly well understood, but many details 
remain a mystery. In particular, there is a lack 
of information on the formation of the most 
massive stars (those about 8-150 times more 
massive than the Sun). Such massive stars are 
rare, but they are the primary sources of light 
in the Milky Way — some are a few hundred 
thousand times more luminous than the Sun’. 

When massive stars form, they start to emit 
energetic radiation, largely in the ultraviolet 
region of the electromagnetic spectrum. This 
UV radiation destroys the mole- 
cules in the surrounding cloud, cre- 
ating a layer of atomic gas around 
young massive stars. In this layer, 
in the region closest to the star, the 
radiation is energetic enough to 
ionize the atoms, forming a bubble 
of ionized gas. At the edge of this 
bubble, the most energetic UV pho- 
tons have already been absorbed, 
and the atomic gas can survive. The 
transition zone between the edge of 
the bubble and the molecular gas is 
called the photodissociation region 
(PDR)’. Just like human skin, the 
PDR protects molecules in the 
cloud from harmful UV radiation. 

Goicoechea and colleagues show 
that it is possible to observe a PDR 
with sufficient resolution to directly 
study how the molecular cloud is 
pushed away and dispersed by the 
stellar radiation and winds of young 
massive stars. The authors use the 


Atacama Large Millimeter/submillimeter 
Array (ALMA) in Chile to study a PDR of the 
Orion molecular cloud: the Orion Bar (Fig. 1). 
The data set presented by Goicoechea et al. and 
the level of detail revealed by the ALMA obser- 
vations are unparalleled, allowing unknown 
aspects of star formation to be explored. 

The Orion Bar is an archetypal PDR that 
makes an ideal study candidate for two rea- 
sons: it is close to Earth, and it is oriented 
edge-on, which allows astronomers a good 
look at how radiation is absorbed as it enters 
the molecular cloud. The local gas den- 
sity in the PDR controls how quickly this 
absorption occurs. In low-density gas (a 
few hundred to a few thousand gas particles 
per cubic centimetre), the medium gradu- 
ally becomes opaque to radiation, whereas 
denser gas (a few million particles per cubic 
centimetre) becomes opaque much more sud- 
denly. In the Orion Bar, this gradual absorp- 
tion happens on a scale of 15 arcseconds* 
(equivalent to less than 1% of the full Moon's 
angular diameter) before the UV radiation is 
sufficiently absorbed so that the molecules 
can survive. 

The measurement of 15 arcseconds 
surprised astronomers because standard 


Figure 1 | The Orion Bar. In this image of the Orion nebula taken by the 
Hubble Space Telescope, the Orion Bar is the bright ridge at the bottom left. 
Goicoechea et al.’ use high-resolution images of the Orion Bar to study the 
impact of young stars on their parent molecular clouds. 
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models of PDRs’ can explain this value only if 
the gas in the Orion Bar has a low density. 
However, radiation observed from the Orion 
Bar requires high-density gas (a few million 
particles per cubic centimetre) to explain 
its emission®”. In theoretical models, such 
a high density would require a smaller dis- 
tance than 15 arcseconds between the atomic 
and molecular gas layers. In other words, 
UV radiation is observed to penetrate deeply 
into the cloud, whereas it should be absorbed 
by high-density gas. 

Earlier studies®* tried to reconcile this 
discrepancy by suggesting that the Orion Bar 
consists of clumps of dense gas embedded in a 
thinner gas. Such a structure would allow for 
both high-density molecular emission and 
deeper penetration of UV radiation into the 
cloud. Goicoechea et al. are the first to directly 
observe such clumps of dense gas in the Orion 
Bar. Their results have strong implications for 
models of PDRs, because they demonstrate 
that even such an archetypal PDR does not 
have the stratified transition between atomic 
and molecular gas layers that was previously 
assumed’. 

The authors’ results also provide some 
explanation for the evolution of the Orion Bar. 
They find evidence of a high-pressure wave 
expanding into the molecular cloud, which 
is consistent with the picture of an expand- 
ing bubble of ionized gas created by the young 
massive star in its centre. The bubble pushes 
against the molecular cloud, compressing 
dense regions, while dispersing less-dense 
regions. However, because of the experimen- 
tal limitations of ALMA, Goicoechea and 
colleagues only observe a small region of the 
Orion Bar, in a snapshot of time and at a lim- 
ited wavelength. To rule out the possibility that 
the authors observed an atypical region with 
respect to PDRs in general, it will be necessary 
to consider a larger sample size, 
including PDRs with various local 
physical conditions. 

An expanding bubble of ionized 
gas is one of the prime candidates 
proposed to explain how the inter- 
action of young stars with their 
parent interstellar clouds controls 


Without these interactions, star 
formation would be about 10 to 
100 times more efficient than what 
is observed. The detailed nature 
and relative importance of these 
interactions with respect to other 
factors that influence star formation 
remain largely unknown. There- 
fore, any direct observation of these 
processes, as presented by Goicoe- 
chea and colleagues, provides a step 
towards a better understanding of 
star formation and, consequently, 
of how the Sun and the Solar 
System formed. m 


the efficiency of star formation”. § 
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Catalytic spliceosome 


captured 


Spliceosome complexes remove non-coding sequences from RNA transcripts in two 
steps. A structure ofa spliceosome after the first step reveals active-site interactions 
and evolutionary constraints on these non-coding regions. SEE ARTICLE P.197 


BRIAN KOSMYNA & CHARLES C. QUERY 


he presence of non-coding sequences 

called introns in nascent RNA tran- 

scripts is a defining characteristic of 
the genomes of eukaryotic organisms, which 
include plants, animals and fungi. Intron 
removal by a spliceosome complex is an 
essential step in gene expression and regu- 
lation. Decades of biochemical and genetic 
studies have provided detailed insights into 
the composition of these complexes and the 
RNA structures within them. However, the 
dynamic nature of the complexes has hindered 
efforts at modelling and structure determi- 
nation at atomic resolution. On page 197, 
Galej et al.’ present a structure of the catalytic 
spliceosome at 3.8 angstréms resolution, 
obtained using single-particle cryo-electron 
microscopy (cryo-EM). This structure not only 
provides evidence in support of reported inter- 
actions’ that bind and position catalytic metal 
ions, but also reveals previously unknown 
molecular features of splicing catalysis. 


The spliceosome is a large, dynamic 
RNA-protein complex that catalyses intron 
removal in two sequential chemical reactions 
(Fig. 1). The chemical mechanism of intron 
removal, as well as the core spliceosomal RNAs 
and proteins, are highly evolutionarily con- 
served in most eukaryotes. The first reaction 
cleaves the nascent transcript at the 5’ end of 
the intron (the 5’ splice site; 5’SS), causing the 
intron to form a lasso-shaped, or ‘lariat’ struc- 
ture. Compositional and structural changes 
in the spliceosome then occur, whereupon 
the second reaction joins together (ligates) 
the coding exon sequences that flank the 
intron, simultaneously generating the mature 
messenger RNA and excising the intron lariat. 

Technological and computational advances’ 
in cryo-EM have led to the structural deter- 
mination of many spliceosomal complexes 
within the past year*”. To obtain particles for 
their study, Galej and co-workers assembled 
spliceosomes in vitro on RNA substrates that 
can proceed through the first catalytic step, 
but not the second one. They then purified 


a_ Before first step (B**) bh After first step (C) c 
ne 5’ exon 
5’ exon 
U U 
=» y = y 
Prp2 A Prpl6 A 


3’ exon 


Figure 1 | Spliceosomal processing of RNA transcripts. The spliceosome 
complex catalyses splicing — the removal of non-coding intron sequences 
(red) from RNA transcripts and the joining together of coding exon 
sequences. a, The spliceosome (not shown) and transcript form the B** 
complex, a fully assembled but catalytically inactive complex. The 

adenosine nucleotide (A) at the intron’s ‘branch site’ is far from the 5’ splice 
site (5’SS) at one end of the intron’. G and U represent two nucleotides, 
guanosine and uridine, of the 5’SS. b, The Prp2 enzyme facilitates transition to 
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spliceosome (ILS). 


After second step (C*) 


NEWS & VIEWS | RESEARCH | 


Astron. Astrophys. 294, 792-810 (1995). 

8. Stutzki, J., Bensch, F., Heithausen, A. Ossenkopf, V. 
& Zielinksy, M. Astron. Astrophys. 336, 697-720 
(1998). 

9. Andree-Labsch, S., Ossenkopf, V. & Réllig, M. 
Preprint at: http://arxiv.org/abs/1405.5553 (2014). 

10.Krumholz, M. R. et al. in Protostars and Planets VI 
(eds Beuther, H., Klessen, R. S., Dullemond, C. P. & 
Henning, T.) 243-266 (Univ. Arizona Press, 2014). 


the resulting complexes using proteins that 
interact with the spliceosome only after the 
first step has occurred. The reported structure 
(Fig. 2) therefore represents complexes that 
form immediately after the first catalytic step. 
This, along with another recently published 
structure’, is the most relevant structure to 
splicing catalysis available. 

Spliceosomal complexes follow an intricate 
pathway during assembly, catalysis and 
recycling, characterized by compositional and 
structural changes (Fig. 1). Enzymes known 
as ATPases facilitate many of the transitions 
between spliceosome complexes. The ATPase 
Prp2 remodels the fully assembled but inactive 
complex (B*") to form the catalytically 
active complex (C). Another ATPase, Prp16, 
removes the intron lariat from the active site 
after the first reaction, and positions the 3’SS 
near to the 5’SS to allow exon ligation. Once 
the second reaction has excised the intron, the 
ATPase Prp22 binds the 3’ exon and moves 
along the mRNA, thus releasing the mRNA 
from the spliceosome. 

Shi and colleagues® recently reported a 
structure in which Prp2 is bound in the B** 
form of the spliceosome, whereas, in Galej 
and co-workers’ structure, Prp2 has been 
replaced by Prp16 in the C complex. These 
structures are the first visualizations of these 
two ATPases bound to spliceosomes. Both 
enzymes are positioned similarly in the 
overall topology of the spliceosome near the 
3’ end of the intron. On the basis of the inter- 
actions between Prp16 and the spliceosome 
observed in their structure, Galej et al. suggest 
that splicing factors unique to each complex 
recruit the specific ATPase needed (see Fig. 6 


d After mRNA release (ILS) 


3’ exon 


roc 


the C complex, which catalyses the first step of splicing: cleavage of 

the 5’SS from the adjacent exon and formation ofa lariat structure, in 

which the branch-site A bonds covalently to the 5’SS-GU. ¢, The Prp16 
enzyme drives formation of the C* complex, which catalyses the second 
splicing step: cleavage of the 3’ splice site and joining together of the two 
exons to form a mature messenger RNA. d, Finally, the Prp22 enzyme 
releases the mRNA from the spliceosome and generates a stable intron lariat 
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of the paper’). Hydrolysis of ATP molecules 
by the ATPases could subsequently destabi- 
lize the associated splicing factors, allowing 
the RNA structures in the catalytic core to be 
remodelled. 

The two substrates for the first catalytic 
step are the 5’SS and an adenosine nucleotide, 
known as the branch site, within the intron. 
Although it has long been thought that these 
two substrates almost certainly interact with 
each other, to help bring them together as 
needed for the catalytic step, neither evidence 
nor models for such an interaction existed. 
Galej and co-workers’ structure reveals 
intimate interactions between the 5’SS (spe- 
cifically, its GU sequence, which consists of a 
guanosine nucleotide next to a uridine nucle- 
otide) and the sequence flanking the branch 
site; these interactions help to explain the evo- 
lutionary conservation of the two sequences. 
For example, an RNA base triple (a structure 
analogous to a base pair, but involving three 
bases) was identified between the uridine of 
the 5’SS-GU and the helix created by base pair- 
ing between the intron sequence flanking the 
branch site and U2, one of the small nuclear 
RNAs that forms the spliceosome’s active site. 
This base triple helps to position the 5’SS near 
the branch-site adenosine, as required for the 
first catalytic step. 

By contrast, in Shi and colleagues’ structure’, 
the 5’SS and branch site are separated by a large 
distance (approximately 49 A ). The guanosine 
of the 5’SS-GU is protected by a pocket formed 
by a protein subunit of the spliceosome and 
a first-step splicing factor. Analogously, the 
branch-site adenosine is positioned in a posi- 
tively charged pocket of another protein subu- 
nit (SF3B1, which is highly mutated in human 
cancers’’). These two pockets protect the reac- 
tive groups involved in the first catalytic step 
until the spliceosome has transitioned to a 
catalytically active conformation. 

Galej and colleagues’ structure also helps 
to explain the evolutionary sequence con- 
servation of the branch site-U2 duplex by 
revealing another base triple interaction 
between the branch-site adenosine and the 
intron—U2 RNA helix two nucleotides away. 
This was presaged in part by interactions 
observed between the branch-site adenosine 
and the intron-U2 RNA helix in an RNA- 
only structure"! previously determined by 
nuclear magnetic resonance spectroscopy. 
This base triple positions the reactive hydroxyl 
group of the branch-site adenosine outward 
towards the 5’SS. 

The structural insights obtained through 
the identification of hundreds of RNA-protein 
and protein-protein interactions in the new 
structures’*” suggest innumerable biochemi- 
cal and genetic experiments to ascertain which 
splicing step these interactions contribute most 
to, and for what intron features they are most 
important. The stage is now set for the explora- 
tion and discovery of many other spliceosome 


Figure 2 | Model of the catalytically active 
spliceosome structure. Galej et al.' report the 
structure of the spliceosome in complex with an 
RNA substrate immediately after the first catalytic 
step of splicing. Most of the spliceosome complex 
is shown as a fainter surface representation 
(different colours represent different components). 
The three small nuclear RNAs (U2, U5 and U6) 
that form the active site are shown in bold, as are 
the intron and 5’ exon of the RNA substrate. 


structures. Like the explosion of successes that 
followed the determination of the ribosome 
structure’ (the protein-synthesis apparatus), 
we eagerly await structures not just for nor- 
mal spliceosome complexes, but also for com- 
plexes that include mutations in pre-mRNA 
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substrates or in spliceosomal components, 
such as those found in many cancers’’. The 
future will allow a more comprehensive pic- 
ture of the basic mechanisms of splicing cataly- 
sis, and of how splice sites are recognized and 
catalysis is regulated. Other achievements may 
also include the determination of features vital 
to the alternative splicing regulation found in 
complex organisms. = 
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within our grasp 


There was thought to be little in common between fish fin bones and the 
finger bones of land-dwellers. But zebrafish studies reveal that hox genes have a 
surprisingly similar role in patterning the two structures. SEE LETTER P.225 


ADITYA SAXENA & KIMBERLY L. COOPER 


he next time you gaze at fish in an 
aquarium, or order a whole trout at your 
favourite restaurant, you may wish to 
ponder how the dozens of thin, delicate bones 
in the fish pectoral fins that lie just behind the 
gills compare with your own fingers. Although 
scientists have long known that the human arm 
evolved from the pectoral fin of our fish ances- 
tors, the relationship between the bones of the 
two strikingly different skeletons has remained 
mysterious. Nakamura et al.' address this issue 
on page 225 and provide evidence that fish 
fin-ray bones and human fingers have more 
in common than was previously thought. 
There are two types of bone, and they form 
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in different ways. Most of the bones in our 
skeleton, including our limbs, start out in 
the embryo as rod-shaped pieces of cartilage 
that build a mineralized scaffold on which the 
bone grows, ina process known as ossification. 
Bone that develops using a cartilage template 
is called endochondral bone and includes the 
short, broad radial fin bones in fish. 

The other type of bone is dermal bone, 
which is found in human shoulder blades and 
in the plate-like bones that form the roof of 
our skulls. Dermal-bone formation does not 
use a cartilage scaffold, but instead proceeds 
by depositing bone material directly on the 
innermost layer of skin, the dermis. Although 
the fin rays of fish and the bones of our fingers 
may seem superficially similar because they 
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Figure 1 | Fin and limb development. Fin rays and mouse digits are formed from different types of 
bone — mouse digits are made of endochondral bone and fish fins are made of dermal bone. The genes 
hoxa13 and hoxd13 are expressed in cells that will become the mouse digits, and mice with mutations 
in these two genes do not form wrist or digit structures in their forelimb’. Nakamura et al.' assessed 

the effect of loss-of-function mutations of hoxa1l3 and hoxd13 in zebrafish, and found that the mutant 
fish fins had dermal-bone structures that were reduced in length and had extra endochondral-bone 
structures, indicating that these hox13 genes are required for both tetrapod digits and fish fin rays. 


are both rod-like structures oriented away 
from the body, fish fin rays are dermal bones, 
whereas our fingers are endochondral bones. 

It has long been thought’ that the digits of 
our earliest four-legged (tetrapod) vertebrate 
ancestors were a structural innovation when 
they first appeared in aquatic species, and 
that fin rays were lost. Digits form at the end 
of a limb skeleton that has three segments: 
the upper arm, the lower arm, and the wrist 
and finger area (also known as the autopod). 
Formation of these segments in the develop- 
ing embryo depends on the function of a few 
key members of the large group of Hox-family 
transcription-factor proteins’. 

In tetrapods, regions of hox gene expression 
shift in space and time from an early pattern 
of nested areas across the anterior—posterior 
axis of the developing limb to a late pattern that 
is characterized by restriction of hoxa13 and 
hoxd13 expression to the autopod region*”. 
Zebrafish also express hox genes in the cells that 
will form the endochondral skeleton®*. How- 
ever, in transgenic mouse embryos, none of 
the identified regulatory DNA sequences 
of the zebrafish hox genes seem to be active 
in the region where digits will form’*. This 
had led researchers to think that the wrist and 
digits were a tetrapod innovation that arose 
as a result of a newly acquired region of hox 
expression. 

The zebrafish is the most commonly used 
model fish, for which well-established genetic 
approaches and laboratory techniques are 
available. However, among the fishes, zebrafish 
are said to be highly derived because they have 
evolved many traits that are not thought to 
have been present in ancestral species. The 
spotted gar fish and zebrafish share a com- 
mon ancestor with tetrapods, but, in some 
ways, the spotted gar has changed less than the 
zebrafish in comparison with their ancestor. 
An enhancer DNA sequence for hoxA from the 


spotted-gar genome can promote a late phase 
of gene expression both in the digit-forming 
region of developing mouse limbs and, surpris- 
ingly, in the distal fin of zebrafish®. Fish don’t 
have fingers, so what do these cells become in 
the zebrafish that can respond to the same reg- 
ulatory sequence that is active in developing 
mouse digits? 

To answer this question, Nakamura and 
colleagues used the spotted-gar hoxA enhancer 
DNA sequences to develop a genetic marker 
system with which to trace the development 
of the population of cells near the tip of the 
zebrafish fin that respond to the enhancer. The 
authors found that these cells go on to contrib- 
ute exclusively to the dermal skeleton of the fin 
rays. Although this is not evidence that fin rays 
and mouse digits are the same, or even that 
tetrapod digits evolved from the rays of fish, 
it does show that there is much more similar- 
ity between the structures than was previously 
thought. This further supports the hypothesis 
that autopod evolution may have occurred by 
the hijacking of some of the developmental 
processes that were already shaping the fins of 
our ancestors. 

The hoxa13 and hoxd13 genes are more 
than mere identifiers of the developing tetra- 
pod digits; they are also essential for autopod 
development, and mice that lack the two 
proteins encoded by these genes do not form 
autopods’. However, testing the role of these 
genes in zebrafish has been difficult because 
the species has undergone full-genome dupli- 
cation, and so there are multiple copies of many 
genes. This can hinder loss-of-function stud- 
ies using conventional mutation and breeding 
approaches, and the effect of loss of function 
of hoxa13 and hoxd13 on the zebrafish fin was 
not known. 

To study loss of function of hoxa13 and 
hoxd13 in zebrafish, Nakamura and col- 
leagues used CRISPR-Cas9 genome-editing 
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technology, which offers a fast and specific 
way to create mutations both in the hoxa13 
duplicate genes (hoxal3a and hoxa13b) 
and in the single copy of hoxd13. The 
resulting mutant fish have a dermal-fin- 
ray skeleton that is dramatically reduced 
in length, together with an increased 
number of distal endochondral radial 
bones (Fig. 1). 

This result is interesting because it is a trans- 
formation of the fish fin that is in some ways 
similar to what is expected to have occurred 
in the earliest tetrapods that lost their dermal- 
fin-ray skeleton and elaborated an endochon- 
dral skeleton to include true digits. Tetrapod 
endochondral digits were previously thought” 
to be homologous with the distal row of fish 
endochondral radial bones that are adjacent to 
the dermal-fin rays. However, the loss of rays 
and gain of true digits are thought’ to be the 
result of further elaboration, not loss, of the 
late phase of hox13 expression in tetrapods. 

Some caution should be taken in the inter- 
pretation of these data. Because zebrafish are 
highly derived compared with more-basal 
fishes, it is possible that the role of hox13 
transcription factors in the development of fin 
rays is a recent zebrafish acquisition. It will be 
important, where possible, to perform some 
of the same fate-mapping and gene loss-of- 
function experiments in fish species, such as 
the paddlefish and gar, that diverged closer to 
the shared ancestor with tetrapods and that 
have fin skeletons with more similarities to 
ancestral tetrapods. Fortunately, these exciting 
questions are emerging just as CRISPR-Cas9 
genome-editing technologies are becom- 
ing options for a variety of unusual model 
species. The answers may soon be within 
our grasp. ™ 
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Photocontrol of fluid slugs in liquid 
crystal polymer microactuators 


Jiu-an Lv!, Yuyun Liu!, Jia Wei!, Erqiang Chen?, Lang Qin! & Yanlei Yu! 


The manipulation of small amounts of liquids has applications ranging from biomedical devices to liquid transfer. Direct 
light-driven manipulation of liquids, especially when triggered by light-induced capillary forces, is of particular interest 
because light can provide contactless spatial and temporal control. However, existing light-driven technologies suffer 
from an inherent limitation in that liquid motion is strongly resisted by the effect of contact-line pinning. Here we 
report a strategy to manipulate fluid slugs by photo-induced asymmetric deformation of tubular microactuators, which 
induces capillary forces for liquid propulsion. Microactuators with various shapes (straight, ‘Y’-shaped, serpentine and 
helical) are fabricated from a mechanically robust linear liquid crystal polymer. These microactuators are able to exert 
photocontrol of a wide diversity of liquids over a long distance with controllable velocity and direction, and hence to 
mix multiphase liquids, to combine liquids and even to make liquids run uphill. We anticipate that this photodeformable 
microactuator will find use in micro-reactors, in laboratory-on-a-chip settings and in micro-optomechanical systems. 


Manipulating small amounts of liquids to perform reactions, analysis 
or fundamental investigations in biology, physics or chemistry is of 
great interest in both scientific research and practical applications! >. 
Conversion of light energy to liquid motion is a new paradigm for 
the actuation of microfluidic systems by using optical forces (through 
radiation pressure and optical tweezers)’, light modulation of 
electrical actuation (optoelectrowetting and photocontrol of electro- 
osmotic flow)*!” or light-induced capillary forces'*-'°. The last of 
these actuation approaches has advantages over the first two in that it 
requires neither special optical set-ups nor complex microfabrication 
steps!: it uses capillary forces generated from a light-induced wettability 
gradient and Marangoni effects. However, the capillary force arising 
from a wettability gradient is too small to overcome the effect of 
contact-line pinning, so the motion is limited to specific liquids over a 
relatively short distance, in simple linear trajectories, and at low speed 
(10-50 zm s~')'*"!5, And use of the light-induced Marangoni 
effect requires either local heating or the addition of photosensitive 
surfactants to liquids, which is undesirable for biomedical applications 
and undoubtedly produces sample contamination!!*!. 


Design of tubular microactuators 
It is well known that a completely wetting liquid droplet confined in 
a conical capillary is self-propelled towards the narrower end because 
of the axial force arising from differing curvature pressures across its 
end caps”?”!. If we were able to build tubular microactuators whose 
geometry could be dynamically adjusted by light, a simple and straight- 
forward method to manipulate liquids would be achieved; therefore, 
a smart material capable of photodeformation is crucial to building 
such tubular microactuators. Photodeformable crosslinked liquid 
crystal polymers are ordered polymers that show large and reversible 
deformation through the orientation change of liquid crystals (LCs) 
and allow temporal, localized, remote and isothermal triggering and 
actuation***”. Hence, crosslinked liquid crystal polymers are good 
candidates for actuators for precise and direct manipulation of liquids 
through photodeformation. 

Unfortunately, to the best of our knowledge, tubular microactuators 
(TMAs) have not yet been fabricated from existing photodeformable 


crosslinked liquid crystal polymers, since they show poor processibility 
(incompatible with common solution and melt processing) owing to 
chemical crosslinking**°. Here we report robust TMAs prepared 
from a newly designed linear liquid crystal polymer (LLCP) that 
show asymmetric geometry change upon irradiation by 470-nm light 
with an intensity gradient along the TMA (Fig. 1a); such irradiated 
TMAs can thus successfully manipulate liquid motion by light (Fig. 1b, 
Supplementary Video 1). (For brevity, we refer to light that has an 
intensity gradient as ‘attenuated light; as this gradient is produced by 
varying attenuation.) The critical design premise of the LLCP includes 
ensuring enough mechanical robustness without chemical crosslinks, 
and the facile attainment of macroscopic LC orientation. 

Arteries are natural robust soft actuators which are capable of with- 
standing impressive pressure stress, displaying rupture strengths up to 
2,000 mm Hg (ref. 40). In the wall of an artery, the components of the 
tunica media, alternate muscle layers and elastic layers, are responsible 
for stimuli-responsive deformation and mechanical robustness*!, 
respectively (Fig. 1c). Inspired by the lamellar structure of artery 
walls, we designed the novel LLCP, which has a long alkyl backbone 
containing double bonds and azobenzene moieties in side chains acting 
as both mesogens and photoresponsive groups (Fig. 1d). We expected 
the flexible backbones and the azobenzene mesogens to self-assemble 
into a nano-scaled lamellar structure due to the molecular cooperation 
effect of LCs. In addition, the long spacers provide enough free volume 
for the azobenzene mesogens to generate a highly ordered structure and 
undergo a fast photoresponse. The transmission electron microscope 
(TEM) image in Fig. 2a clearly demonstrates that the LLCP self- 
assembles into a lamellar structure; this is confirmed by the atomic force 
microscope (AFM) image of the LLCP film (Extended Data Fig. 1). 
Two-dimensional wide-angle X-ray diffraction (2D WAXD) indicates 
that the layer spacing of the lamellar structure is 4.6 nm (Fig. 2b), and 
the azobenzene mesogens are tilted at an angle of ( =65° in the lamellar 
layers (Fig. 2c). 

In order to promote mechanical robustness, ring-opening metathesis 
polymerization—living polymerization that allows synthesis of a 
high-molecular-weight polyolefin with narrow polydispersity—was 
employed to prepare the LLCP*”. The number-average molecular 


1Department of Materials Science, State Key Laboratory of Molecular Engineering of Polymers, Fudan University, 220 Handan Road, Shanghai 200433, China. *Beijing National Laboratory for 
Molecular Sciences, Key Laboratory of Polymer Chemistry and Physics of Ministry of Education, Peking University, Beijing 100871, China. 
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Figure 1 | Design of tubular microactuators. a, Schematics showing 
the motion of a slug of fully wetting liquid confined in a tubular 
microactuator (TMA) driven by photodeformation. The light is incident 
perpendicular to the long axis of the TMA and has a gradient of incident 
intensity (produced by attenuation), decreasing from left to right. Shape 
transformation of the TMA from cylindrical to conical is induced by 
this gradient of light intensity. As a result, the slug advances to the 
narrower end of the TMA. b, Lateral photographs of the light-induced 
motion of a silicone oil slug in a TMA fixed on a substrate that were 
taken through an optical filter to remove light with wavelengths below 
530 nm. On irradiation by 470-nm light whose intensity (represented by 


weight of the LLCP reached 3.6 x 10° gmol7!, which is at least one order 
of magnitude larger than that of the generally used photoresponsive 
azobenzene LC polymers**™*. Tensile tests show that the LLCP fibre 
has a moderate elastic modulus (96 + 19 MPa), high toughness 
(319+41 MJ m°), high strength (~20 MPa) and a large elongation at 
break (or fracture strain; 2,089% + 275%) (Supplementary Video 2). 
Thus the LLCP is a strong and tough material, which we ascribe to the 


Figure 2 | Structures of the LLCP and images of freestanding TMAs. 

a, TEM image showing the lamellar structure of the LLCP film. 

b, Two-dimensional wide-angle X-ray diffraction pattern of the LLCP 
film exhibiting lamellar reflections on the meridian and LC diffraction 
arcs in the quadrants. The X-ray beam is applied to the side of the film 
and parallel to the plane of the film. 20 denotes the diffraction angle, 

d represents the lateral distance between the LC mesogens. The yellow 
arrow indicates the film horizontal. The dashed ellipses display the outline 
of the diffraction spots in the wide angle area. The white arrow indicates 
the diffraction angle and the lateral distance values of the diffraction spots. 
c, Schematic representation of packing structure in the LLCP film. The LC 
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open arrows) is attenuated increasingly from left to right (top row), the 
silicone oil slug is self-propelled towards the right; when the direction of 
attenuation is reversed (bottom row), the direction of movement of the 
slug is also reversed (Supplementary Video 1). c, Schematic illustration 

of the structure of artery walls. The middle coat of an artery, called the 
tunica media, consists of alternating muscle layers and elastic layers, 
which are responsible for stimuli-responsive deformation and mechanical 
robustness, respectively. Image adapted from ref. 51, Elsevier. d, Molecular 
structure of a novel linear liquid crystal polymer (LLCP). M,, number- 
average molecular weight; M,, weight-average molecular weight. 


ordered lamellar structure and the high molecular weight. Moreover, 
the absence of a chemical network means that broken samples can be 
reshaped; a ‘healed’ fibre with a cross-sectional area of 0.02 mm? can 
still sustain a large load, up to ~52 g (Extended Data Fig. 2). 

Thanks to rational structure design and the robust mechanical 
properties of the LLCP, we were able, for the first time, to fabricate 
structurally defined and robust TMAs via a solution processing 


Flexible backbones © 


Azobenzene mesogen (‘photonic muscle’) 


mesogens self-assemble into a smectic phase, and the zigzag tilting of LC 
mesogens takes place in smectic lamella. py denotes the tilt angle between 
the long axis of the azobenzene mesogens and the plane of the lamella. 

The long axis of the azobenzene mesogen is along the black dashed line. 
The normal of smectic lamellar is perpendicular to the horizontal of the 
LLCP film. d, Photographs showing left to right a batch of free-standing 
straight, serpentine and helical TMAs. The serpentine TMA is leaning 
against the edge of a glass slide. The inner diameter of the straight TMAs is 
0.5mm, and that of both serpentine and helical TMAs is 0.6 mm. The wall 
thickness of all the TMAs is ~8 jum. 
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method. We filled a glass capillary with a solution of the LLCP in 
dichloromethane (~3 wt%). After evaporation of the dichloromethane 
at 50°C, the inner surface of the glass capillary was uniformly coated 
with the LLCP. The coated capillary was annealed at 50°C for 30min 
and then immersed in hydrofluoric acid to remove the glass. The 
free-standing TMA that was produced is robust enough to resist large 
deformation for many cycles (Extended Data Fig. 3, Supplementary 
Video 3). TMAs with arbitrary geometries, such as “Y’-shaped, 
serpentine and helical, were also prepared by the same method. 


Photocontrol of fluid slugs 

Previously reported liquid manipulation based on light-induced 
capillary force is usually applicable only to specific liquids (oil and 
some LCs)!, and still faces great challenges in handling most commonly 
used liquids. We note that each type of our TMAs (straight, ‘Y’-shaped, 
serpentine and helical) shows unique abilities to propel a wide range of 
liquids spanning nonpolar to polar liquids, such as silicone oil, hexane, 
ethyl acetate, acetone, ethanol and water (Part 1 of Supplementary 
Video 4). More surprisingly, our TMAs can also propel complex 
fluids efficiently, such as a train of slugs, emulsion, liquid—solid fluid 
mixtures and even petrol (Fig. 3a, b, Extended Data Fig. 4, and Parts 
2-5 of Supplementary Video 4), which have not yet been handled 
using existing light manipulation principles’. We also successfully 
manipulated liquids widely used in biomedical engineering with 
the TMAs, such as bovine serum albumin solution, phosphate 
buffer solution, cell culture medium and cell suspension (Part 6 of 
Supplementary Video 4), which is of great significance for biomedical 
analysis and micro-engineering. 

Furthermore, our TMA works as a micromixer, offering a versatile 
toolbox for microfluidic mixing. Extended Data Fig. 4 and Part 4 of 
Supplementary Video 4 demonstrate the mixing of polyethylene micro- 
spheres and ethyl acetate with the aid of vortex circulation in the moving 
liquid slug, which is ascribed to the viscous stresses caused by the shear 
between the slug and the inner surface. This stirring behaviour takes 
advantage of the hydrodynamic effect and thus relies on neither stirrers 
nor special microfluidic designs. Figure 3c shows that the light-induced 
vortex circulation strongly promotes the dissolution of benzophe- 
none in 75 vol% ethanol (Supplementary Video 5). Upon exposure to 
attenuated 470-nm light, benzophenone completely dissolves in ethanol 
within 45 s. However, a similar weight of benzophenone dissolves little 
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under the same conditions but without the light irradiation (Extended 
Data Fig. 5). The stirring behaviour in the photodeformable TMA is 
fully triggered by the external light, and allows repeatable and reversible 
switching between a non-mixing mode and an efficient mixing 
action. This approach could be easily adapted to diverse microfluidic 
configurations because it does not require the implementation of any 
specific element (such as a valve or an electrode)**. More interestingly, 
our TMA is able to drive a silicone oil slug to capture and convey a 
microsphere by spatially controlled irradiation (Fig. 3d, Supplementary 
Video 6), which implies great potential in applications of microscale 
reaction and micromechanical operation. 


Photodeformation mechanism of the TMAs 

All these liquid handling abilities arise from asymmetric photo- 
deformation of the TMAs in response to attenuated 470-nm light, 
which is a novel principle for inducing capillary force. It has been 
reported that azobenzene mesogens can be realigned along the 
direction perpendicular to the polarized direction of actinic linearly 
polarized blue light after repetition of trans—cis—trans isomerization 
cycles (Extended Data Fig. 6), which is known as the Weigert effect*”. 
In the case of unpolarized light, only the propagation direction is 
perpendicular to the polarized direction of the unpolarized light, thus 
the azobenzene mesogens orientate along the propagation direction of 
the actinic unpolarized light*”** (Fig. 4a). When the TMAs are exposed 
to unpolarized 470-nm light whose actinic direction is perpendicular to 
the long axis of the TMAs, the azobenzene mesogens are reorientated 
along the propagation direction (Fig. 4b), which has also been 
experimentally confirmed by 2D-WAXD patterns of the TMA wall 
cut and flattened out into a plane before and after irradiation (Extended 
Data Fig. 7). Therefore, the tilt angles y of azobenzen mesogens in the 
different exposed areas are different because the lamellae of the LLCP 
are arranged coaxially in the TMA wall. 

In order to facilitate understanding of this photo-reorientation, 
the wall of the TMA is flattened out into a plane, as shown in Fig. 4b. 
According to the tilt angle of azobenzene mesogens calculated from 2D 
WAXD (Fig. 2b), the azobenzene mesogens in ~70% of the exposed 
area are reoriented to exhibit yy <65°, which means this area expands 
along the y axis. The rest of the azobenzene mesogens are tilted with 
65° < p< 90°, leading to contraction along the y axis. In other words, 
the expansion of the light-exposed area is far larger than the contraction 


Figure 3 | Photocontrol of fluid slugs. a, Lateral 
photographs showing light-induced motion 

of a biphase fluid, containing an air bubble 
sandwiched between two silicone oil slugs. In all 
rows, the open arrows denote the intensity of the 
incident light. b, Lateral photographs showing 
light-controlled transportation of an emulsion 
prepared by dispersing rapeseed oil slugs in 
silicone oil (see Part 3 of Supplementary Video 4). 
Inset, an enlarged photograph exhibiting the 
small droplets of rapeseed oil in the emulsion. 
The diameter of the arrowed small droplet is 
~0.12mm. ¢, Lateral photographs showing 

that light-induced vortex circulation strongly 
promotes the dissolution of benzophenone 
(~0.03 mg) in ethanol (~0.3 il, 75 vol%). See 
Supplementary Video 5. d, Lateral photographs 
showing that a silicone oil slug captures and 
conveys a polyethylene microsphere (0.43 mm) 
through the light-induced deformation of the 
TMA (Supplementary Video 6). All photographs 
were taken through an optical filter that removed 
light with wavelengths below 530 nm; time is 
shown at lower right of each panel. 
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Figure 4 | Mechanism of photodeformation of the TMA and velocity 

of light-induced liquid motion. a, Schematics showing reorientation 

of mesogens in azobenzene-containing LC systems with non-polarized 
blue light that is incident at angle 0. Double arrows show the polarization 
direction of the light. b, Schematics illustrating the reorientation of 
mesogens in the cross-sectional area of the TMA before and after 
irradiation by unpolarized 470-nm light. To facilitate understanding the 
photo-reorientation, the wall is flattened out into a plane. The normal 
direction of the lamellae is along the x direction in the scheme. Before 
irradiation by the light, y of all the LC mesogens is 65° (top). On light 
irradiation, the LC mesogens in the exposed surface of the TMA are 
realigned to the direction of the actinic light, which results in the change 
of y in the exposed area (bottom). The orange and blue parts of the 
cross-sectional area respectively expand and contract along the y axis on 
light irradiation. This photoinduced reorientation leads to the decrease 

in thickness of the TMA wall (along the x axis) and the elongation of the 
perimeter of the TMA (along the y axis), which contributes to the increase 
of cross-sectional area. c, Left, plot showing the area of six different 
cross-sections ($;—S¢, shown right) before (red line) and after (black line) 
irradiation by attenuated 470-nm light. Error bars, s.d. (n = 3). z represents 


of that area. This photoinduced reorientation results in a decrease of 
the thickness of the TMA wall (along the x axis in Fig. 4b) and an 
elongation of the perimeter of the TMA (along the y axis in Fig. 4b), 
which together cause an increase of the cross-sectional area of the 
TMA. Moreover, the higher the light intensity, the larger the increase 
in cross-sectional area. Figure 4c shows that the cross-sectional areas 
of the photodeformed TMA at different positions increase with the 
increase of the light intensity upon irradiation by attenuated 470-nm 
light, whereas the cross-sectional areas at different positions without 
irradiation are almost the same. Therefore, the TMA deforms to an 
asymmetric cone-like geometry, which generates adjustable capillary 
force to propel liquids in the direction of light attenuation (Fig. 1a, 
Supplementary Video 7). 


Discussion 

We find that the direction of movement of the slug can easily be 
controlled by varying the direction in which the intensity of the actinic 
light decreases (Supplementary Video 1). To the best of our knowledge, 
this kind of directed motion of liquids in closed channels has not been 
reported before. Supplementary Video 8 demonstrates that the TMA 
undergoes an obvious wall displacement upon exposure to unpolarized 
470-nm light. After turning off the light source, the TMA returns to its 
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the distance between one end of the TMA and the cross-section. Length of 
blue arrows denotes the intensity of 470-nm light, produced by varying its 
attenuation. d, Plot showing the displacement of the exposed wall of the 
TMA by alternately switching on and off 470-nm light. On irradiation by 
unpolarized 470-nm light with intensity 125 mW cm ~’, the upper surface 
of the TMA is displaced by ~16|1m (Supplementary Video 8). The upper 
surface returns to initial position immediately when the light is switched 
off. The cycle of light-induced motion can be repeated as many as 

100 times without obvious fatigue. e, Schematic exhibiting 

self-propelled motion of a wetting slug in a conical TMA to the 

narrower end, induced by photodeformation. In the conical geometry, 

the opening angle is denoted by a, the distance of the slug from the 

apex by x, the length of the slug by L, and the leading and trailing 

contact angles by 6; and 43, respectively. f, Plot showing the moving 

speeds of three different wetting liquids under the same irradiation 
conditions: silicone oil (blue column), ethyl acetate (red column), and 
hexane (green column). Error bars, s.d. (n = 3). g, Plot showing the slug 
end position versus irradiation time when the intensity of the 470-nm light 
source is different. J, =125mW_cm * (green circles), 4=100mWcm * 
(red squares), and J;= 60 mW cm ~? (blue triangles). 


initial size due to elastic recovery of unexposed regions and entropic 
restoring forces imparted by the exposed region”. Such reversible 
deformation on intermittent irradiation with 470-nm light can 
be repeated over 100 cycles without obvious fatigue (Fig. 4d), because 
the creep of the LLCP is minimized by the smectic organization of the 
side groups, which might act as physical crosslinks. 

In the TMA, the slug of wetting liquid (Extended Data Table 1) 
is subjected to two forces: capillary driving force and viscous force, 
which oppose each other. The balance between these two forces yields 
a steady speed v¥ ay/(87)) that is independent of slug position x and 
slug length L, where a is the opening angle schematically shown in 
Fig. 4e, 7 is the dynamic viscosity of the liquid and is its surface tension. 
Hence the moving speed of different wetting liquids varies because of 
their different ratio of /7 (Fig. 4f), when a is fixed. The speed at which 
hexane moves reaches 5.9mm s~1, which is the fastest speed of liquid 
motion driven by light-induced capillary force found so far. 

For liquids that only partially wet the wall of the TMA (Extended 
Data Table 1), we coated the inner wall with a layer that the liquid 
could wet completely. This enables partially wetting liquids to 
move like wetting liquids in the TMA. For example, ethanol wets a 
polyacrylamide surface but only partially wets the inner wall of the 
TMA, so a TMA coated with polyacrylamide propels an ethanol slug 
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at a speed of ~0.3mm s_!. As for water, which has relatively large 
surface tension, we succeeded in enhancing both the wettability and 
the roughness of the TMA inner wall by applying a composite gel 
layer; this made water fully spread out, that is, become fully wetting 
(see details in Supplementary Information). Consequently, the coated 
TMA is capable of handling various aqueous liquids, including those 
used in biomedical applications (Part 6 of Supplementary Video 4). 
Since any partially wetting liquid can be propelled as a wetting liquid 
by modifying the TMA with a suitable coating layer, the resistance 
generated by contact-line pinning is completely excluded from liquid 
being moved using our system; thus our TMAs are in theory able to 
propel any liquid. 

The speed at which wetting liquids can be moved is also affected by 
the a of the conical TMAs, which can be simply tuned through the 
intensity of actinic light (Fig. 4g). For example, the moving speed of a 
silicone oil slug changed from 0.05 to 0.2mm s~' when the intensity of 
light source changed from 60 to 125mW cm. We further calculated 
the uniform velocity of the liquid slug when the capillary driving force 
and the viscous resistance reached balance; for the silicone oil slug 
we calculate this velocity to be 0.04-0.15mm s~!, which is close to 
the measured speed mentioned above (for details see Supplementary 
Information). Moreover, Part 1 of Supplementary Video 9 shows a 
silicone oil slug moving a distance of 57 mm, which is two orders of 
magnitude larger than its length (~0.53 mm). As long as the slug is 
confined in the conical TMA, it keeps moving until the actinic light 
is switched off: that is, there is theoretically no limit to the moving 
distance. Additionally, the movement of the slug is synchronized to the 
switching of the light source, whereas the previous principles based on 
light-induced wettability gradient require a period of time to activate 
droplet movement”. 

In the previous reports, droplet motions driven by light were usually 
limited to linear moving trajectories on a horizontal surface'’, and there 
was only one work related to light-driven motion of a droplet on a 
slope, specifically a 12° incline, with a velocity of ~0.002 mm s“!. It is 
noteworthy that our TMAs can propel a silicone oil slug uphill on a 17° 
incline with a speed of ~0.1mm s“! (Fig. 5a, Part 2 of Supplementary 
Video 9). More intriguingly, the versatile TMAs can not only enable 
light control of liquid motion on a horizontal ‘S’-shaped trajectory, 
but also enable helical trajectories in three dimensions (Fig. 5b, c, and 
Parts 3 and 4 of Supplementary Video 9). To our knowledge, this is 
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the first time light-driven liquid motion has been achieved in curved 
closed microchannels. Compared with straight tubes, propelling liquids 
in curved tubes is more difficult because of remarkably larger flow 
resistance arising from secondary flow*’. Moreover, liquid fusion, 
which is crucial in biomedical fluid processing and microscale reactor 
operation!”"’’, has been achieved by using a ‘Y’-shaped TMA (Fig. 5d, 
Part 5 of Supplementary Video 9). 

Our approach to handling liquids requires only a single, standard 
LED source: there is no need for special optical set-ups or high power 
laser sources. Moreover, the propelled liquids are confined in closed 
microchannels, and the irradiating light on the outer surface of the 
TMAs has no direct contact with the propelled liquids. Hence our light- 
driven liquid manipulation avoids photo-induced thermal effects and 
any resulting sample damage, which are especially undesirable for 
biology-oriented applications. In order to demonstrate the possibility 
of using 470-nm blue light as an excitation source to induce the 
motion of liquids in biological systems, a piece of ~1-mm-thick lean 
pork was placed between the light source and the TMA (Extended 
Data Fig. 8). The intensity of the incident light was reduced from 140 
to ~60 mW cm ” after passing through the pork; however, it was 
still able to drive the movement of a silicone oil slug at an average 
speed of 0.048 mm s~! owing to the ability of 470-nm blue light to 
penetrate into tissues (Part 6 of Supplementary Video 9). These 
experiments demonstrate that our TMAs have promise for application 
in microfluidic systems embedded in biological tissues. 


Conclusions 

Our TMAs present a conceptually novel way to propel liquids by 
capillary force arising from photo-induced asymmetric deformation, 
which relies on neither wettability gradients nor the Marangoni effect. 
The TMAs can propel not only simple liquids spanning a broad range 
of polarity, but also complex fluids widely used in biomedical and 
chemical engineering; they thus have considerable potential application 
as micro-pumps in microsystems technology and architecture without 
any aid from additional components. Moreover, the demonstrated 
effective light-control of liquid mixing and the capture and movement 
of microspheres on the microscale could greatly simplify microfluidic 
devices. Therefore, our photodeformable TMAs are excellent candidates 
for application in the fields of micro-reactors, laboratory-on-a-chip 
contexts and micro-optomechanical systems. 


Figure 5 | Light-driven manipulation of liquid 
in straight, serpentine, helical and ‘Y’-shaped 
TMAs. a, Lateral photographs showing light- 
driven motion ofa silicone oil slug in a straight 
TMA tilted up at G=17° (Part 2 of Supplementary 
Video 9). The slug moved about 2 mm in 20s. 

b, Lateral photographs showing light-driven 
motion of a silicone oil slug in a serpentine TMA 
(Part 3 of Supplementary Video 9). c, Lateral 
photographs showing the motion ofa silicone oil 
slug in a helical TMA around the neck of a glass 
bottle (Part 4 in Supplementary Video 9). d, First 
image, photograph of a “Y’-shaped TMA; other 
images, lateral photographs showing light-driven 
liquid fusion of two silicone oil slugs 1 and 2 at 
the junction of the ‘Y’-shaped TMA (Part 5 of 
Supplementary Video 9). The white dashed frames 
indicate the outline of the slugs. The intensity of 
the 470-nm light source is 125-140 mWcm ~’, 
and the lengths of the open arrows indicate the 
varying intensity after attenuation. The actinic 
light direction is perpendicular to the long axis of 
the TMA. The photographs except the first image 
in d were taken through an optical filter to remove 
light with wavelengths below 530 nm. Scale bars, 
0.5mm. 
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METHODS 

Preparation of LLCP fibres. A small amount of the LLCP (20 mg) was heated to 
120°C (isotropic phase) on a glass slide placed on a hot stage (Mettler, FP-90 and 
FP-82). LLCP fibres were prepared by dipping the tip of a toothpick into the melt 
and pulling away as quickly as possible. 

LLCP films prepared by solution processing. A solution of the LLCP in CHCl, 
(~1wt%) was drop-coated on a glass slide. The LLCP films formed after the 
evaporation of the CHCly, and were then annealed at 50°C for 1h and separated 
from the glass slide by immersion in water. 

Characterization. 'H NMR and °C NMR spectra of the azobenzene functional 
cyclooctene monomer C11ABé6 and the LLCP were recorded on a Bruker DMX500 
NMR spectrometer using tetramethylsilane as the internal standard and CDCl; 
as solvent. MALDI-TOF-MS spectra of C11AB6 were measured on an AB SCIEX 
5800 spectrometer. Gel permeation chromatography of the LLCP was performed in 
THF with an eluent rate of 1.0ml min! onan Agilent 1100 with a G1310A pump, 
a G1362A refractive-index detector, and a G1314A variable-wavelength detector. 
AFM tapping mode images were acquired by using a Bruker Dimension FastScan 
AFM. The AFM samples of a thin LLCP film were prepared by spin-coating a flat 
glass substrate with the solution of the LLCP in CH2Cl, (~0.1 wt %) and annealed 
at 50°C for 1h. High-resolution TEM images were recorded on a JEM-2100 TEM 
at an accelerating voltage of 200kV. The TEM samples of sheared LLCP film were 
cut at —100°C using a cryomicrotome apparatus (Leica, FC7-UC7) with a liquid 
nitrogen cooling instrument, and were mounted on a copper grid and stained with 
OsO, for 30 min to increase the mass-thickness contrast for TEM. 2D-WAXD 
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experiments on the LLCP films and the TMAs prepared by solution processing 
were conducted on a Bruker D8 Discover diffractometer with a 2D detector of 
GADDS in transmission mode. The X-ray sources (Cu Ka, \=0.154nm) were 
provided by 3kW ceramic tubes and the peak positions were calibrated with 
silicon powder (20> 15°) and silver behenate (20 < 10°). The background 
scattering was recorded and subtracted from the sample patterns. High-resolution 
scanning electron microscopy (SEM) images were recorded on a field-emission 
scanning electron microscope (Zeiss, Ultra 55). 

The tensile stress-strain measurements of the LLCP fibre were performed 
using an Instron Universal Testing Machine (Model 5943) at a deformation rate of 
60mm min 1 in air. The toughness, a parameter that characterizes the work required 
to fracture the sample per unit volume, was calculated from the area below the tensile 
stress-strain curve until fracture. The wall thickness, cross-sectional area and pho- 
todeformation (photographs and displacement) of the TMAs as well as photographs 
and videos of the light-induced liquid motion were taken by a super-resolution 
digital microscope (Keyence, VHX-1000C). Visible light at 470 nm was obtained 
from an LED irradiator (CCS, HLV-24GR-3W). Attenuated 470-nm light was 
produced by placing a rectangular, continuously variable and metallic neutral density 
filter (Thorlabs, NDL-25C-4) in front of the 470-nm LED irradiator. The thermal 
effect of 470-nm light on the LLCP film was recorded by a thermal imaging camera 
(FLIR, E40). Contact angles (CAs) were measured on a contact angle analyser 
(Dataphysics OCA15) with 211 measuring droplets. Dynamic contact angles were 
obtained with a DCAT 21 tensiometer (DataPhysics Instruments Gmbh). In each 
case, a minimum of three samples were analysed to ensure reproducibility. 
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Extended Data Figure 1 | Lamellar structure of the LLCP film. a, AFM topographic image of the LLCP film. b, The line profile along the white 
line in a, showing that the thickness of one lamella and two lamellas is 4.38 nm and 8.53 nm, respectively. 
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Extended Data Figure 2 | Mechanical properties of the LLCP. 

a, Photographs showing preparation of the healed LLCP fibre. The two 

cut pieces of the fibre (left) were overlapped for a length of 5mm and 
covered by two glass slides (middle). Then, the fixed fibres were put into 
an oven and healed for 1h at 55°C (right). b, Photographs demonstrating 
the strength of a virgin and healed LLCP fibre (left and right, respectively). 
The inset in the left photograph shows that the fibre has a clip at each end; 
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the upper clip is hung below an iron beam, and the bottom clip is 
loaded (weight w) with many iron rings strung together by an iron 
wire. The healed fibre supports a load of 51.52 g, while the virgin fibre 
is loaded with 55.53 g. c, Photographs demonstrating the toughness of a 
LLCP fibre. The LLCP fibre was stretched to 22 times its initial length 
by a tensile machine (Instron model 5943) (Supplementary Video 2). 
Left, unstretched; right, stretched. 
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Extended Data Figure 3 | Mechanical robustness of a tubular force was released (right). This buckling was repeated for 30 cycles without 
microactuator, TMA. The sequence of photographs shows the unloaded any damage to the TMA (Supplementary Video 3). The TMA was clipped 
TMA (left) buckling under an external force without damage. The buckled _ between the tips of a pair of tweezers, and was buckled by the opening and 
TMA (middle) spontaneously recovered its initial shape when the external _closing of the tweezers. The diameter of the TMA is 0.5mm. 
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A stack of PE microspheres 
500 ym 


(Part 4 in Supplementary Video 4). Bottom, schematic illustration 
of vortex circulation in the slug. The diameter of the polyethylene 
microspheres is ~35 1m. The length of the open arrows denotes the 


showing the mixing of acetic ether (ethyl acetate) and PE (polyethylene) intensity of 470-nm light. 
microspheres in the slug that occurs when the TMA photodeforms 


Extended Data Figure 4 | Light-induced motion of a solid-liquid 
slug consisting of acetic ether and polyethylene microspheres. 
Top, a sequence of side-on photographs (at times of 0, 3, 6 and 7s) 
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Benzophenone 
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in 


500 um Ethanol (75 vol%) 
Os 


Extended Data Figure 5 | Dissolution of benzophenone in ethanol through passive diffusion in the TMA. This sequence of side-on photographs 


(at times 0, 15, 25 and 45s) shows that benzophenone (~0.03 mg) within an ethanol (~0.3 il, 75 vol%) slug dissolves little over a period of 45s without 
the light irradiation. 
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Extended Data Figure 6 | Schematics demonstrating the mechanism of The cis—trans isomerization of azobenzene molecules is also induced by 
photoalignment of azobenzene mesogens under linear polarized blue the light. After repetition of many trans—cis—trans isomerization cycles, 
light. Left, trans-azobenzene molecules with their transition moments trans-azobenzene molecules have reoriented to be perpendicular to the 
parallel to the polarization direction of the light are effectively activated to _ polarization direction of the actinic light, and hence inactive towards the 
their excited states, which is followed by trans-cis isomerization (middle); _ incident radiation (right); this production of a net population of trans- 
but molecules with their transition moments perpendicular to the azobenzene molecules aligned perpendicularly to the light polarization 
polarization direction of actinic light are inactive towards isomerization. is known as the ‘Weigert effect. 
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20= 20.5°, d= 0.44 nm 


Extended Data Figure 7 | Effect of irradiation on the 2D-WAXD 
patterns of the flat TMA wall cut and flattened out into a plane. 

a, b, Before (a) and after (b) irradiation. A higher intensity and a longer 
irradiation time of the 470-nm light are employed in this 2D- WAXD 
measurement compared with those in the experiments on light-induced 
liquid motion in the TMA, which ensures that most of the azobenzene 


470 nmlight 2min 
Per SS 


200 mW cm 


/ 


20= 20.5°, d= 0.44 nm 


mesogens in the flat wall are reorientated along the light propagation 
direction. Thus, the 2D-WAXD signal of the flat wall is strong enough 

to be detected. The X-ray beam is applied from the lateral side of the wall 
and parallel to the plane of the wall. 20 denotes the diffraction angle and 

d represents the lateral distance between the azobenzene mesogens. Yellow 
arrows denote the horizontal direction of the flat TMA wall. 
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Extended Data Figure 8 | Light-driven liquid motion on irradiation by the intensity of 470-nm light. The volume of the silicone oil slug is 0.2 1]; 
470-nm light that has been attenuated by passing through lean pork. the thickness of the lean pork and the glass slide is ~1 mm and 1.2mm, 
The white and green arrows indicate the leading edge and the trailing edge _ respectively. Part 6 of Supplementary Video 9 shows this process in full. 


of a silicone oil slug, respectively. The length of the open arrows denotes 
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Extended Data Table 1 | Dynamic and static contact angles of different liquids on the LLCP film surface 


Probe liquid a(°) & (°) & (°) A@C) 
Silicone oil 0 0 0 0 
Petroleum ether 0 0 0 0 
Acetic ether 0 0 0 0 
Hexane 0 0 0 0 
Acetone 17.7+2.1 67.8+0.6 67.0+0.6 0.8+0.1 
Isopropyl alcohol 14.643.8 69.2+1.0 69.221.1 0.1+0.1 
Ethanol 17.5+3.6 69.8+1.2 68.341.0 1.5+0.2 
Water 96.5+2.6 127.3+2.3 73.042.6 54.443.4 


0, Oa, Or, AO represent static contact angle, advancing angle, receding angle and the difference between advancing angle and receding angle on flat LLCP films; 
data are from three individual measurements of each variable. Errors, s.d. 
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Structure-based discovery of opioid 
analgesics with reduced side effects 


Aashish Manglik', Henry Lin?*, Dipendra K. Aryal**, John D. McCorvy?, Daniela Dengler*, Gregory Corder®, Anat Levit?, 
Ralf C. Kling*°, Viachaslau Bernat*, Harald Hitbner*, Xi-Ping Huang?, Maria F. Sassano’, Patrick M. Giguére*, Stefan Lober‘, 
Da Duan”, Grégory Scherrer), Brian K. Kobilka!, Peter Gmeiner*, Bryan L. Roth? & Brian K. Shoichet? 


Morphine is an alkaloid from the opium poppy used to treat pain. The potentially lethal side effects of morphine and related 
opioids—which include fatal respiratory depression—are thought to be mediated by ,:- opioid-receptor (OR) signalling 
through the 3-arrestin pathway or by actions at other receptors. Conversely, G-protein OR signalling is thought to 
confer analgesia. Here we computationally dock over 3 million molecules against the OR structure and identify new 
scaffolds unrelated to known opioids. Structure-based optimization yields PZM21—a potent G; activator with exceptional 
selectivity for »OR and minimal 6-arrestin-2 recruitment. Unlike morphine, PZM21 is more efficacious for the affective 
component of analgesia versus the reflexive component and is devoid of both respiratory depression and morphine-like 
reinforcing activity in mice at equi-analgesic doses. PZM21 thus serves as both a probe to disentangle ,,OR signalling and 
a therapeutic lead that is devoid of many of the side effects of current opioids. 


Opiate addiction, compounded by the potentially lethal side effects 
of opiates such as respiratory depression, has driven optimization 
campaigns for safer and more effective analgesics since the 19th 
century. Although the natural products morphine and codeine, and 
the semi-synthetic drug heroin, are more reliably effective analge- 
sics than raw opium, they retain its liabilities. The classification of 
opioid receptors into 1, 6, and & and nociception subtypes!” raised 
hopes that subtype-specific molecules would lack the liabilities of 
morphinan-based opiates. Despite the introduction of potent synthetic 
opioid agonists like methadone and fentanyl, and the discovery of 
endogenous opioid peptides’, developing analgesics without the 
drawbacks of classic opioids has remained an elusive goal. Recent 
studies have suggested that opioid-induced analgesia results from 


Selectivity 


LOR signalling through the G protein G;, while many side effects, 
including respiratory depression and constipation, may be conferred 
via $-arrestin pathway signalling downstream of |tOR activation 
(Fig. 1a)*"®. Agonists specific to the ,OR and biased towards the G; 
signalling pathway are therefore sought both as therapeutic leads and 
as molecular probes to understand :OR signalling. Recent progress 
has supported the feasibility and potential clinical utility of such biased 
WOR agonists””®. 

The determination of the crystal structures of the p, 6, K and nocic- 
eptin opioid receptors”! (Fig. 1b, c) provided an opportunity to seek 
new LOR agonists via structure-based approaches. Recent discovery 
campaigns have used crystal structures of other Family A G-protein- 
coupled receptors (GPCRs) to computationally dock large libraries of 


Figure 1 | Structure based ligand discovery 
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diverge between the opioid receptors. 

c, Conserved features of opioid ligand recognition 
in the pOR. d, Overlaid docking poses of 

23 compounds selected for experimental testing. 

e, Single-point competition binding assay of 

23 candidate molecules against the OR antagonist 
3H-diprenorphine. Each ligand was tested at 20 |.M 
and for those with >25% inhibition affinity 

was calculated in full displacement curves; data 
represent mean + s.e.m. (m= 3 measurements). 
One of these hits, compound 7, was subsequently 
optimized. f, Docking pose of compound 7. 
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molecules, identifying ligands with new scaffolds and with nanomolar- 
range potencies'*-'”, We thus targeted the ,,OR for structure-based 
docking, seeking ligands with new chemotypes. We reasoned that such 
new chemotypes might confer signalling properties with new biological 
effects, as has been true for other structure-based campaigns'®!?, 


Structure-based docking to the »OR 

We docked over 3 million commercially available lead-like 
compounds” against the orthosteric pocket of inactive ,OR’, 
prioritizing ligands that interact with known affinity-determining 
residues and with putative specificity residues that differ among the 
four opioid receptor subtypes (Fig. 1b, d). For each compound, an 
average of 1.3 million configurations was evaluated for complemen- 
tarity to the receptor using the physics-based energy function”! in 
DOCK3.6. As is common in docking”””’ and screening, the top ranking 
molecules were inspected for features not explicitly captured in the 
scoring function. We manually examined the top 2,500 (0.08%) docked 
molecules for their novelty, their interactions with key polar residues 
such as Asp147>*? (superscripts indicate Ballesteros-Weinstein 
numbering”), and deprioritized those that showed conformational 
strain (a term occasionally poorly modelled by the scoring function). 
Ultimately, 23 high-scoring molecules with ranks ranging from 237 
to 2,095 out of the over 3 million docked were selected for testing 
(Fig. le). Compared to the 5,215 OR ligands annotated in 
ChEMBL16”°, these docking hits had Extended Connectivity 
Fingerprint 4 (ECFP4)-based Tanimoto coefficients (T.) ranging 
from 0.28 to 0.31, which is consistent with the exploration of novel 
scaffolds”®. Of the 23 tested, seven had OR binding affinities 
(Kj) ranging from 2.3 1M to 141M (Extended Data Table 1, Extended 
Data Fig. 1). 

The new ligands are predicted to engage the 1OR in new ways 
(Fig. 1f and Extended Data Fig. 1). Most opioid ligands use a cationic 
amine to ion-pair with Asp147*”, a canonical interaction”’ observed 
in structures of the OR, OR, KOR and nociceptin receptor bound to 
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Figure 2 | Discovery of a novel G;/,.-biased ,.OR agonist. a, Compound 
12 was identified among a series of analogues to compound 7 and further 
investigated due to its j1OR specificity and efficacy as a ,OR agonist. 

b, Docking pose of Compound 12. c, Compound 12 is a »OR agonist in 

a Gio signalling assay with an ECso of 180 nM. DAMGO is a prototypical 
unbiased opioid agonist. d, Despite robust activation of Gio, compound 12 
induces minimal arrestin recruitment as compared to DAMGO. For ¢, d, 
data are mean +s.e.m. of normalized results (n = 3-6 measurements). 
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ligands of different scaffolds®"!**. As anticipated, the docked ligands 
recapitulated this interaction. Much less precedence exists for the 
formation of an additional hydrogen bond with this anchor aspartate, 
often mediated in the docking poses by a urea amide. In several of 
the new ligands the urea carbonyl is modelled to hydrogen bond with 
Tyr148*9, while the rest of the ligands often occupy sites unexplored 
by morphinans (Extended Data Fig. 1). To our knowledge, the double 
hydrogen bond coordination of Asp147** modelled in the docking 
poses has not been anticipated or observed previously for opioid 
ligands, and only 50 of the 5,215 annotated opioid ligands in 
ChEMBLI6 contain a urea group. 

Despite the structural novelty of the initial docking hits, their 
affinities were low. To enhance binding and selectivity, we docked 
500 analogues of compounds 4, 5 and 7 that retained the key recogni- 
tion groups but added packing substituents or extended further towards 
the extracellular side of the receptor, where the opioid receptors are 
more variable. Of the 15 top-scoring analogues that were tested, seven 
had K; values between 42 nM and 4.7 {1M (Extended Data Table 2). 
Encouragingly, several were specific for the |,OR over KOR (compounds 
12-15, Extended Data Table 2). We then investigated the more potent 
analogues for signalling potency and efficacy. Although the structure 
we docked against was the inactive state of the ,»}OR, compounds 
8 and 12-14 activated G;,, (Extended Data Table 2). A similar enrich- 
ment for agonists was previously seen in a docking study against the 
inactive state of the KOR”’, perhaps reflecting the small changes in 
the orthosteric pocket associated with opioid receptor activation”’. 
Encouragingly, the most potent compound, 12 (Fig. 2a, b), strongly 
activated Gi/,. with low levels of 3-arrestin-2 recruitment (Fig. 2c, d). 


Structure-guided synthetic optimization 

To optimize compound 12, we synthesized stereochemically pure 
isomers and introduced a phenolic hydroxyl (Fig. 3a). The synthesis 
of the (S,S) stereoisomer of 12 improved affinity (Kj) to 4.8nM and 
had a signalling ECs» of 65 nM; it was the most potent and efficacious 
Gio signalling agonist among the four isomers (Fig. 3e). The phenolic 
hydroxyl, introduced to make compound (S,S)-21, was designed to 
exploit a water-mediated hydrogen bond with His297°, an interaction 
observed in the structure of ,OR in complex with 3-funaltrexamine 
(8-FNA) (Fig. 3b) and in other structures of the 6OR*8 and KOR!!. This 
hydroxyl was readily accommodated in the docked jtOR-12 complex, 
improving the predicted docking energy (Fig. 3c). Compound (S,S)- 
21 had an ECs of 4.6nM ina Gj, activation assay, with 76% efficacy 
(Fig. 3f), and a K; of 1.1nM in radioligand binding assays (Extended 
Data Table 3), an improvement of 40-fold versus 12. The other 
three stereoisomers of (S,S)-21 were much less potent or efficacious 
(Extended Data Fig. 2a, b), suggesting a specific stereochemical 
requirement for both potency and efficacy in agreement with the 
docked poses of (S,S)-21 to the inactive and active structures”? of |tOR 
(Fig. 3c, Extended Data Fig. 2c, d). We refer to (S,S)-21 as compound 
PZM21 henceforth. 

Because PZM21 was discovered against the inactive structure of 
LOR, its docked complex to active ,OR retains ambiguities. To inves- 
tigate its receptor-bound structure further, more detailed docking and 
molecular dynamics simulations were conducted. The resulting model 
was tested by synthesizing molecules that either perturbed or exploited 
specific modelled interactions (Fig. 3c, d, Extended Data Figs 2 and 3). 
Neutralization of charge by amidation (compound PZM28) decreases 
potency by 1,000-fold, supporting a key ionic interaction between the 
PZM21 tertiary amine and Asp147?? (Fig. 3d and Extended Data 
Fig. 3). Compound PZM27, which adds steric bulk to the tertiary 
amine, was synthesized to disrupt putative hydrophobic interactions 
between the N-methyl group and Met1513* and Trp293°’, consistent 
with its 30-fold loss of potency and decreased efficacy (Fig. 3d and 
Extended Data Fig. 3). Compounds PZM23, PZM24 and PZM25, 
which were synthesized to disrupt hydrogen bonding interactions in 
the model between the urea and Asp147?*?, Tyr32678 and Gln124?°, 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


b Inactive LOR 
Compound 12 s £22955 Compound 12 Te gecl2 
K, = 42 nM AID NA B-FNA 
TMS 
ony AAG x ¢ 
NN (o1ge0TM4 


Resolve 
stereochemistry (e) 


Compound (S,S)- 12. 
K,=4.8nM 


es G5 


3008553 


\ 
Add phenolic as -( ) wists J 
Boniad yee (b, c) ay 1296851 1207.39 () ne 
K=1nM A rAd ~~ yag6r4s 
w293648\I ty7 = 
HO a 
}OR-PZM21 interactions 
N127263 


ARTICLE 


Active wOR Figure 3 | Structure-guided optimization 
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lose between 30- and 230-fold potency despite their decreased solvation 
penalties (Fig. 3d and Extended Data Fig. 3). These key ionic and 
hydrogen-bonding interactions are maintained for 3 js of molecular 
dynamics simulations of PZM21 in complex with active OR, as are 
interactions between the phenolic hydroxyl and the bridging waters 
to His297°*’, further supporting their relevance to the modelled pose 
(Extended Data Fig. 2g). The thiophene of PZM21, modelled to fit in 
the more open specificity region of the »OR, can be replaced with a 
larger benzothiophene without loss of potency (Extended Data Fig. 3). 
Interactions of this thiophene with residues that differ among the opioid 
receptor sub-types may contribute to PZM21 specificity (Extended Data 
Fig. 2e). More compellingly, the simulations and docking predict 
that the PZM21 thiophene comes within 6 A of Asn127? in the 
active |tOR (Extended Data Fig. 2g). Accordingly, we synthesized an 
irreversible version of PZM21 (compound PZM29) designed to form 
a covalent bond with OR engineered with an N127C mutation. 
Compound PZM29 binds irreversibly to this mutant but not the wild- 
type receptor and retains its efficacy as an agonist (Extended Data 
Fig. 3), supporting the overall orientation of PZM21 as modelled and 
simulated in the orthosteric OR site. 


PZM21 is a selective Gj-biased OR agonist 

PZM21 had no detectable KOR or nociceptin receptor agonist 
activity—it is actually an 18nM KOR antagonist—while it is a 500- 
fold weaker 8OR agonist (Extended Data Fig. 4 and Extended Data 
Table 3), making it a selective jOR agonist. To investigate specificity 
more broadly, PZM21 was counter-screened for agonism against 316 
other GPCRs*°. Activity at 10 1M was observed at several peptide and 
protein receptors; however, no potent activity was confirmed with a 
full dose-response experiment at these receptors. PZM21 therefore 
has high agonist specificity among GPCRs (Extended Data Fig. 5a-c). 
PZM21 was also tested for inhibition of the hERG ion channel and 
the dopamine, norepinephrine and serotonin neurotransmitter 
transporters. AthERG, PZM21 had an ICso of between 2 and 41M, 


500- to 1,000-fold weaker than its potency as a OR agonist (Extended 
Data Fig. 5d). Its inhibition of the neurotransmitter transporters, which 
are also analgesia targets, was even weaker with ICs» values ranging 
from 7.8 to 34 1M (Extended Data Fig. 5e). Thus, PZM21 is a potent, 
selective, and efficacious 1 opioid agonist. 

A major goal of this study was to find new chemotypes that might 
display biased signalling and perhaps, unlike canonical opioid drugs, 
have more favourable in vivo profiles. Signalling by PZM21 and other 
OR agonists appears to be mediated primarily by the heterotrimeric 
G protein Gio, as its effect on cAMP levels was eliminated by pertussis 
toxin and no activity was observed in a calcium release assay (Extended 
Data Fig. 6a—-d). A maximal concentration of PZM21 led to no detectable 
B-arrestin-2 recruitment in the PathHunter assay (DiscoverRx) 
(Fig. 3g and Extended Data Fig. 6c) and a minimal level of OR inter- 
nalization compared to DAMGO and morphine (Extended Data 
Fig. 6e). Indeed, 8-arrestin-2 recruitment was too low to even permit 
a formal calculation of bias*', which quantifies the preference for one 
signalling pathway over another. Since B-arrestin recruitment can 
depend on the expression level of G protein-coupled receptor kinase 2 
(GRK2)°*”, we also investigated Gi. signalling and arrestin recruit- 
ment in cells co-transfected with this kinase. Even in the presence 
of overexpressed GRK2, PZM21 still has weak arrestin recruitment 
efficacy compared to DAMGO and even to morphine (Extended Data 
Fig. 6g-i). In fact, the signalling bias of PZM21 was undistinguishable 
from TRV 130, a G;-biased opioid agonist now in Phase HI clinical trials 
(Fig. 3f, g), whereas its G-protein-bias substantially exceeded that of 
herkinorin, which has also been purported to be a G;-biased agonist*? 
(Extended Data Fig. 6). An intriguing distinction in these signalling 
studies is the lack of agonist activity of PZM21 at KOR. While PZM21 is 
an 18-nM antagonist of this receptor, the other biased agonist, TRV 130, 
activates KOR with similar potency to morphine (Extended Data 
Fig. 6f). Additionally, despite having similar levels of signalling bias, 
in modelling studies TRV130 and PZM21 appear to engage the p,OR 
in distinct ways (Extended Data Fig. 2f). 
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Analgesia with diminished side effects 

Consistent with its ,OR agonist activity, PZM21 displayed dose- 
dependent analgesia in a mouse hotplate assay, with a per cent maximal 
possible effect (% MPE) of 87% reached 15 min after administration of 
the highest dose of drug tested (Fig. 4a). The highest dose of morphine 
tested plateaued at 92% after 30 min. Intriguingly, we observed no anal- 
gesic effect for PZM21 in the tail-flick assay (Fig. 4b). Such a distinction 
is unprecedented among opioid analgesics. The hotplate experiment 
assesses analgesia at both higher-level central nervous system (CNS) 
brain and spinal nociceptive circuits, while the tail-flick experiment 
is more specific for spinal reflexive responses**. Subcategorizing the 
behavioural responses in the hotplate experiment as either affective 
(CNS mediated) or reflexive (spinally mediated) showed that, unlike 
morphine, PZM21 solely confers analgesia to the affective component 
of pain (Fig. 4c and Extended Data Fig. 7a, b). Though separation 
of these two analgesic pathways is unique to PZM21 among known 
opioid analgesics, it has been observed by selective chemogenetic 
activation® or toxin-induced inactivation of CNS neurons in rodents”. 
Indeed, PZM21 is also active in a formalin injection nociception assay, 
likely from supraspinal activation of descending inhibitory circuits*” 
(Fig. 4d). Whether this circuit-specificity reflects the biased sig- 
nalling of PZM21, its specificity for the OR versus other opioid 
receptors and other GPCRs, an unusual CNS distribution phenom- 
enon, or some other signalling property, is uncertain at this time. 


More certain is that PZM21 analgesia results from ,1OR activation 
in vivo as genetic knockout of the |,OR completely ablates the observed 
analgesic response in the hotplate assay (Fig. 4e). Meanwhile, PZM21 
is metabolized relatively slowly by mouse liver microsomes, with only 
8% metabolism over one hour. Signalling experiments with the 
resulting metabolite pool show no evidence of a metabolite with more 
potent activation of the LOR, confirming that the observed analgesic 
activity results primarily from the originally administered dose of 
PZM21 (Extended Data Fig. 7e, f). 

Based on previous genetic studies with arrestin knockout mice and 
pharmacological studies with biased compounds*’, we anticipated 
that PZM21 would confer longer-lasting analgesia with decreased res- 
piratory depression and constipation—both key dose-limiting side- 
effects of classic opioid agonists. Analgesia induced by PZM21 lasts up 
to 180 min, substantially longer than that induced by a maximal dose 
of morphine (Extended Data Fig. 7a, b) and the biased agonist TRV 130 
(Fig. 4a). Whereas PZM21 does reduce defecation, its constipating effect 
is substantially less than morphine (Fig. 4f). Respiratory depression was 
investigated by dosing unrestrained mice with equi-analgesic doses 
of PZM21, TRV130 and morphine (40 mg kg~', 1.2mg kg~!, and 
10mg kg~', respectively), and measuring respiration by whole-body 
plethysmography. While morphine profoundly depressed respiration 
frequency, PZM21 was almost undistinguishable from vehicle 
(Fig. 4g). By comparison, TRV 130 significantly depresses respiration 
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at 15 min, correlating with its peak analgesic response. Although 
respiratory depression by OR may be partially mediated by activation 
of G-protein-coupled inwardly rectifying potassium channels (GIRKs), 
systemically infused opioids can decrease respiratory frequency even 
in GIRK-deficient mice*’, consistent with the G-protein-independent 
signalling mechanisms first suggested by the arrestin knockout 
studies*°. The rapid respiratory depression observed for morphine 
and TRV 130 may reflect GIRK activation. At later time points, however, 
PZM21 induces minimal respiratory depression despite providing 
robust analgesia. Conversely, morphine induces a prolonged course of 
respiratory depression that does not subside with resolution of the anal- 
gesic response at 90 min. This dissociation in analgesia and respiratory 
depression at later time points may reflect differential recruitment of 
B-arrestin-2. Taken together, these studies support minimal $-arrestin-2 
signalling in vivo by PZM21 (biased signalling, Fig. la). 

A major liability of current opioid analgesics is reinforcement 
and addiction, which are both postulated to be mediated—at 
least in part—by activation of the dopaminergic reward circuits*?. 
A biomarker for such activation in mice is an acute hyperlocomotive 
response, reflecting mesolimbic dopaminergic activation*®. Whereas 
morphine induced mouse hyperlocomotion in an open-field assay 
(Fig. 4h), a nearly equi-analgesic dose of PZM21 had no apparent 
effect on locomotion versus vehicle. The decreased distance travelled 
does not reflect a cataleptic effect of PZM21 (Extended Data Fig. 7c). 
Consistent with decreased activation of reward circuits, administration 
of PZM21 also does not induce a conditioned place preference response 
(Fig. 4i), unlike morphine and other opioids*’. Though TRV130 does 
trend more towards inducing place preference, its activity is also not 
significant relative to vehicle; this lack of conditioned place preference 
for both biased agonists may support a role for G-protein bias in the 
lack of opioid-induced reinforcing behaviour. The differences between 
morphine and PZM21 in conditioned place preference do not simply 
reflect dissimilarities in CNS penetration between the two drugs, 
as a substantial fraction of PZM21 crosses the blood-brain barrier 
(Extended Data Fig. 7d). 

Several caveats deserve to be mentioned. Although structure-based 
discovery succeeded in finding novel scaffolds and supported facile 
optimization, some of the properties of PZM21 were likely fortuitous. 
Biased signalling through G protein and arrestin pathways reflects the 
stabilization of conformations over 30 A from the orthosteric site where 
PZM21 binds. We did not select molecules that preferentially stabilize 
these conformations, but instead relied on chemical novelty to confer 
new biological properties. Receptor subtype selectivity was attained 
by simply selecting molecules that extended into variable regions of 
the receptor, a strategy that may not always work. Several aspects of 
the pharmacology presented here remain preliminary, including 
the metabolic stability studies and the pharmacokinetics, and it is 
not clear at this time whether the unprecedented in vivo activity of 
PZM21 reflects its biased and specific agonism, or some other feature 
conferred by its novel chemotype. Finally, identification of agonists 
from docking to an inactive state receptor structure cannot always be 
relied upon’*!®*-3, though there is precedence for doing so against 
opioid receptors”. 


Discussion 

Notwithstanding these caveats, this study supports a structure-based 
approach for GPCR ligand discovery. Whereas this method cannot 
yet reliably find leads with tailored specificity and signalling efficacy, 
it can reliably identify entirely new scaffolds and chemotypes. These 
new chemotypes may stabilize receptor conformations not explored 
previously and so generate novel biological effects. With a novel chemo- 
type in hand, the docked structure provides a straight-forward strategy 
for optimization. Here, we optimized an initial docking hit, compound 
7, 1,000-fold to the final lead molecule, PZM21, by evaluating fewer 
than 50 molecules. Though this campaign was inspired by existing 
tOR-biased agonists like TRV 130’, the structure-based approach led 
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to a compound with novel properties; it was structurally distinct com- 
pared to previously explored opioid ligands, with not only substantial 
signalling bias but also with unexpected opioid receptor selectivity. 
These features have contributed to favourable biological effects, with 
long-lasting analgesia coupled to apparent elimination of respiratory 
depression, specificity for central over reflex analgesia, lack of locomo- 
tor potentiation and conditioned place preference, and hence a reduced 
potential for opioid-induced reinforcement for PZM21 and molecules 
like it. The selectivity, potency, and biased signalling of PZM21 make it 
a tool molecule of a sort previously unavailable to interrogate j1OR sig- 
nalling. More broadly, the in vitro results of multiple GPCR campaigns, 
and the in vivo results reported here, portend a general approach to 
the problem of new tool and lead discovery for this pharmacologically 
important family of receptors. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. 

Chemicals, reagents, and cell lines. Chemicals and reagents used in this study 
were purchased from commercial sources (Sigma, Tocris, Fisher scientific, ZINC 
database suppliers) or synthesized as outlined in the Supplementary Information. 
HEK293 (ATCC CRL-1573; 60113019; certified mycoplasma free and authentic 
by ATCC) and HEK293-T (HEK293T; ATCC CRL-11268; 59587035; certified 
mycoplasma free and authentic by ATCC) cells were from the ATCC and are 
well validated for signalling studies. Cells were also validated by analysis of short 
tandem repeat (STR) DNA profiles and these profiles showed 100% match at the 
STR database from ATCC. U20S cells expressing human \1OR were obtained as 
cryopreserved stocks from DiscoverX and were not further authenticated. 
Molecular docking and analogue selection. The inactive-state |t-opioid receptor 
structure (PDB: 4DKL) was used as input for receptor preparation with DOCK 
Blaster (http://blaster.docking.org)“. Forty-five matching spheres were used based 
on a truncated version of the crystallized ligand. The covalent bond and linker 
region of the antagonist }-funaltrexamine were removed for sphere generation. The 
ligand sampling parameters were set with bin size, bin size overlap, and distance 
tolerances of 0.4A, 0.1 A, and 1.5A, respectively, for both the matching spheres 
and for the docked molecules. Ligand poses were scored by summing the receptor- 
ligand electrostatics and van der Waals interaction energy corrected for ligand 
desolvation. Receptor atom partial chargers were used from the united atom 
AMBER force field except for Lys233 and Tyr326, where the dipole moment 
was increased as previously described‘’. Over 3 million commercially available 
molecules from the ZINC”? (http://zinc.docking.org) lead-like set were docked 
into the receptor using DOCK3.6”! (http://dock.compbio.ucsf.edu). Among the 
top ranking 0.08% of molecules were inspected and 23 were selected for experi- 
mental testing in the primary screen. A resource to perform these docking studies 
is publicly available (http://blaster.docking.org). 

For a secondary screen, analogues of the top three hits from the primary 

screen (compounds 4, 5 and 7) with a similarity of greater than 0.7 (as defined 
in the ZINC search facility) were identified in the ZINC database. Additionally, 
substructure searches were performed using the scaffolds of each of these three 
compounds. The searches yielded 500 purchasable compounds, which were 
then docked as in the primary screen. Analogues were manually inspected for 
interactions and selected for further experimental testing. 
Radioligand binding studies. For a primary screen of selected molecules, 
binding to LOR was assessed by measuring competition against the radioligand 
3H-diprenorphine 7H-DPN). Each compound was initially tested at 201M and 
was incubated with 7H-DPN at a concentration equal to the Ky (0.4nM) of the 
radioligand in j1OR containing Sf9 insect cell membranes. The reaction contained 
40 fmol of ,,OR and was incubated in a buffer of 20mM HEPES pH 7.5, 100mM 
sodium chloride, and 0.1% bovine serum albumin for 1h at 25°C. To separate 
free from bound radioligand, reactions were rapidly filtered over Whatman GF/B 
filters with the aid of a Brandel harvester and 7H-DPN counts were measured by 
liquid scintillation. Compounds with more than 25% of 7H-DPN radioactivity 
were further tested in full dose-response to determine the affinity (K;) in HEK293 
membranes. Subsequently, the 15 analogues were tested in full dose-response 
for affinity at the OR and the KOR by the National Institutes of Mental 
Health Psychoactive Drug Screen Program (PDSP)*, as were the affinities of 
compounds 12, PZM21, and their stereoisomers at the OR, 50R, KOR and 
nociception receptor. 

Radioligand depletion assays to test the irreversible binding of compound 
PZM29 were performed as described previously*®. Human embryonic kidney 
293 (HEK 293) cells were transiently transfected with ,,OR or the cysteine mutant 
j}OR:N127C using the Mirus TransIT-293 transfection reagent (MoBiTec, 
Goettingen, Germany), grown for 48h, harvested, and homogenates were 
prepared as described*’. For radioligand depletion experiments, homogenates were 
preincubated in TRIS buffer (50 mM Tris at pH 7.4) at a protein concentration of 
50-100 1g/ml or 70-120 |1g/ml for OR and jtOR:N127C, respectively and the 
covalent ligand (at 541M) for different time intervals. Incubation was stopped by 
centrifugation and reversibly bound ligand was washed three times (resuspension 
in buffer for 30 min and subsequent centrifugation). Membranes were then used 
for radioligand binding experiments with *H-diprenorphine (final concentration: 
0.7 nM, specific activity: 30 Ci/mmol, purchased from Biotrend, Cologne, 
Germany) to determine specific binding at the LOR (Bmax = 4,000-6,500 fmol/mg 
protein, Kp = 0.25-0.45 nM) and the jjOR:N127C receptor (Bmax = 1,300- 
6,000 fmol/mg protein, Kp = 0.18-0.25 nM), respectively as described**. 
Non-specific binding was determined in the presence of 101M naloxone. For data 
analysis, the radioactivity counts were normalized to values where 100% represents 
effect of buffer and 0% represents non-specific binding. Five independent 
experiments, each done in quadruplicate, were performed and the resulting values 
were calculated and pooled to a mean curve which is displayed. 
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GTP-S Binding Experiments. The [*°S]-GTP1S binding assay was performed 
with membrane preparations from HEK 293 cells coexpressing the human ,tOR 
and the PTX insensitive G-protein subunits Gao or Gain’. Cells were transiently 
transfected using the Mirus TransIT-293 transfection reagent (MoBiTec, 
Goettingen, Germany), grown for 48 h, harvested and homogenates were prepared 
as described*”. The receptor expression level (By,ax) and Kp values were determined 
in saturation experiments with *H-diprenorphine (specific activity: 30 Ci/mmol, 
purchased from Biotrend, Cologne, Germany) (Bmax = 3,700 + 980 fmol/mg 
protein, Kp = 0.30 + 0.093 nM for pOR+Goo1 Or Bmax = 5,800 + 2,000 fmol/mg, 
Kp = 0.46 £ 0.095 nM for tjOR+Goaja, respectively). The assay was carried out 
in 96-well plates with a final volume of 20011. In each well, 10}1.M GDP, the 
compounds (0.1 pM to 100,.M final concentration) and the membranes (30,1g/ml 
final protein concentration) were incubated for 30 min at 37 °C in incubation buffer 
containing 20 mM HEPES, 10 mM MgCl « 6 HO and 70 mg/l saponin. After the 
addition of 0.1nM [°°S]-GTPyS (specific activity 1,250 Ci/mmol, PerkinElmer, 
Rodgau, Germany) incubation was continued at 37 °C for further 30 min or 75 min 
for WOR+Gao1 or LOR+Gajia, respectively. Incubation was stopped by filtration 
through Whatman GF/B filters soaked with ice cold PBS. Bound radioactivity was 
measured by scintillation measurement as described previously*®. 

Data analysis was performed by normalizing the radioactivity counts (ccpms) to 
values when 0% represents the non-stimulated receptor and 100% the maximum 
effect of morphine or DAMGO. Dose-response curves were calculated by 
nonlinear regression in GraphPad Prism 6.0. Mean values + s.e.m. for ECs9 and 
Emax Values were derived from 3-12 individual experiments each done in triplicate. 
Gijo induced cAMP inhibition. To measure LOR Gi.-mediated cAMP inhibition, 
HEK-293T cells were co-transfected using calcium phosphate in a 1:1 ratio with 
human OR and a split-luciferase based cAMP biosensor (pGloSensor™-22F; 
Promega). For experiments including GRK2 co-expression, cells were transfected 
with 1 1g/15-cm dish of GRK2. After at least 24h, transfected cells were washed 
with phosphate buffered saline (PBS) and trypsin was used to dissociate the cells. 
Cells were centrifuged, resuspended in plating media (1% dialysed FBS in DMEM), 
plated at a density of 15,000-20,000 cells per 40,11 per well in poly-lysine coated 
384-well white clear bottom cell culture plates, and incubated at 37°C with 5% 
CO) overnight. For inactivation of pertussis-toxin (PTX) Gai/o experiments, cells 
were plated with 100 ng/ml final concentration PTX. The next day, drug dilutions 
were prepared in fresh assay buffer (20 mM HEPES, 1 x HBSS, 0.1% bovine serum 
album (BSA), and 0.01% ascorbic acid, pH 7.4) at 3x drug concentration. Plates 
were decanted and 20 11 per well of drug buffer (20 mM HEPES, 1 x HBSS, pH 7.4) 
was added to each well. Drug addition to 384-well plates was performed by FLIPR 
adding 1011 of drug per well for a total volume of 3011. Plates were allowed to 
incubate for exactly 15 min in the dark at room temperature. To stimulate endoge- 
nous cAMP via 8 adrenergic-G, activation, 10,11 of 4x isoproterenol (200 nM final 
concentration) diluted in drug buffer supplemented with GloSensor assay substrate 
was added per well. Cells were again incubated in the dark at room temperature for 
15 min, and luminescence intensity was quantified using a Wallac TriLux microbeta 
(Perkin Elmer) luminescence counter. Data were normalized to DAMGO-induced 
cAMP inhibition and analysed using nonlinear regression in GraphPad Prism 6.0 
(Graphpad Software Inc., San Diego, CA). 

Determination of functional activity of PZM21-29 for SAR studies was 
performed using a BRET-based cAMP accumulation assay*’. HEK-293T cells 
were transiently co-transfected with pcDNA3L-His-CAMYEL42 (purchased from 
ATCC via LCG Standards, Wesel, Germany) and human j:OR, achieving a cDNA 
ratio of 2:2 using Mirus TransIT-293 transfection reagent. 24h post-transfection, 
cells were seeded into white half-area 96-well plates at 20 x 10* cells/well and 
grown overnight. On the following day, phenol-red-free medium was removed and 
replaced by PBS and cells were serum starved for 1h before treatment. The assay 
was started by adding 1011 coelenterazine h (Progmega, Mannheim, Germany) to 
each well to yield a final concentration of 5,.M. After 5 min incubation, compounds 
were added in PBS containing forskolin (final concentration 10|1M). Reads of the 
plates started 15 min after agonist addition. BRET readings were collected using a 
CLARIOstar plate reader (BMG LabTech, Ortenberg, Germany). Emission signals 
from Renilla Luciferase and YFP were measured simultaneously using a BRET1 
filter set (475-30 nm/535-30nm). BRET ratios (emission at 535-30 nm/emission 
at 475-30 nm) were calculated and dose-response curves were fitted by nonlinear 
regression using GraphPad Prism 6.0. Curves were normalized to basal BRET ratio 
obtained from dPBS and the maximum effect of morphine and DAMGO. Each 
curve is derived from three to five independent experiments each done in duplicate. 
Calcium release. Calcium release was measured using a FLIPR™"™4 fluorescence 
imaging plate reader (Molecular Devices). Calcium release experiments were 
run in parallel to Gi, Glosensor experiments with the same HEK-293T cells 
transfected with ,sOR, except cells for FLIPR were plated in poly-lysine coated 
384-well black clear bottom cell culture plates. Cells were incubated at 37°C with 
5% CO) overnight and next day media was decanted and replaced with Fluo-4 
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direct calcium dye (Life Technologies) made up in HBSS with 20 mM HEPES, 
pH 7.4. Dye was incubated for 1h at 37°C. Afterwards, cells were equilibrated to 
room temperature, and fluorescence in each well was read for the initial 10s to 
establish a baseline. Afterwards,101l of drug (3x) was added per well and the 
maximum-fold increase in fluorescence was determined as fold-over-baseline. 
Drug solutions used for the FLIPR assay were exactly the same as used for Gio 
Glosensor experiments. To activate endogenous G,-coupled receptors as a positive 
control for calcium release, TFLLR-NH) (10|1M, PAR-1 selective agonist) was used. 
Receptor internalization. Internalization was measured using the eXpress 
DiscoveRx PathHunter GPCR internalization assay using split }-galactosidase 
complementation. In brief, cryopreserved U20S cells expressing the human ,,OR 
were thawed rapidly and plated in supplied medium and 96-well culture plates. 
Next day, cells were stimulated with drugs (10x) and allowed to incubate for 
90 min at 37 °C with 5% CO . Afterwards, substrate was added to cells and chemi- 
luminescence was measured on a TriLux (Perkin Elmer) plate counter. Data were 
normalized to DAMGO and analysed using Graphpad Prism 6.0. 
G-Arrestin recruitment assays. 3-Arrestin recruitment was measured by either 
the PathHunter enzyme complementation assay (DiscoveRx) or by previously 
described bioluminescence resonance energy transfer (BRET) methods*!. Assays 
using DiscoveRx PathHunter eXpress OPRM1 CHO-K1 3-Arrestin GPCR Assays 
were conducted exactly as instructed by the manufacturer. Briefly, supplied cryo- 
preserved cells were thawed and resuspended in the supplied medium, and plated 
in the furnished 96-well plates. Next day, 10x dilutions of agonist (prepared in 
HBSS and 20 mM HEPES, pH 7.4) were added to the cells and incubated for 
90 min. Next, the detection reagents were reconstituted, mixed at the appropriate 
ratio, and added to the cells. After 60 min, luminescence per well was measured 
ona TriLux (Perkin-Elmer) plate counter. Data were normalized to DAMGO and 
analysed using the sigmoidal dose-response function built into GraphPad Prism 6.0. 
To measure OR mediated (3-arrestin recruitment by BRET in the presence 
or absence of GRK2 co-expression, HEK-293T cells were co-transfected in a 
1:1:15 ratio with human sOR containing C-terminal renilla luciferase (RLuc8), 
GRK2, and venus-tagged N-terminal 3-arrestin-2, respectively. In the case of 
experiments where GRK2 expression was varied, pcDNA3.1 was substituted for 
GRK2 to maintain the same concentration of DNA transfected. After at least 24h, 
transfected cells were plated in poly-lysine coated 96-well white clear bottom cell 
culture plates in plating media at a density of 125,000-250,000 cells per 20011 per 
well and incubated overnight. The next day, media was decanted and cells were 
washed twice with 6011 of drug buffer and incubated at room temperature for at 
least 10 min before drug stimulation. 30 1l of drug (3x) was added per well and 
incubated for at least 30 min in the dark. Then, 1011 of the RLuc substrate, coelen- 
terazine H (Promega, 5\.M final concentration) was added per well, and plates were 
read for both luminescence at 485 nm and fluorescent eYFP emission at 530nm 
for 1s per well using a Mithras LB940 microplate reader. The ratio of eYFP/RLuc 
was calculated per well and the net BRET ratio was calculated by substracting the 
eYFP/RLuc per well from the eYFP/RLuc ratio without venus-arrestin present. 
Data were normalized to DAMGO-induced stimulation and analysed using 
nonlinear regression in GraphPad Prism 6.0. 
Ligand bias calculation. Multiple approaches have been described to quantitate 
ligand bias, including operational models, intrinsic relative activity models, 
and allosteric models*!*”. In the absence of GRK2, we observe no 6-arrestin-2 
recruitment for PZM21 and TRV130. This prevents a quantitative assessment 
of bias by the operational model. In the case where GRK2 is overexpressed, we 
observe arrestin recruitment for PZM21 and TRV130. In this case, we utilize the 
operational model to calculate ligand bias and display equiactive bias plots for 
comparison of ligand efficacy for distinct signalling pathways*!. The Glosensor 
Gi/o, DiscoverX PathHunter 3-arrestin, or net BRET concentration response curves 
were fit to the Black-Leff operational model to determine transduction coefficients 
(7/Ka). Compound bias factors are expressed after normalization against the 
prototypical opioid agonist DAMGO used as a reference. Bias factors are expressed 
as the value of AAlog[7/Ka]. 
Assessment of off-target PZM21 activity. To identify potential off-target 
activity of PZM21, we used the National Institutes of Mental Health Psychoactive 
Drug Screen Program. Compound PZM21 was first tested for activity against 
320 non-olfactory GPCRs using the PRESTO-Tango GPCRome screening 
B-arrestin recruitment assay*”. We used 10\1M PZM21 and activity at each receptor 
was measured in quadruplicate. Potential positive receptor hits were defined as 
those that increase the relative luminescence value twofold. Positive hits were 
subsequently re-tested in full dose-response mode to determine whether the 
luminescence signal titrates with increasing concentrations of PZM21. A number 
of false-positive hits were discounted by this approach. PZM21 inhibition of hERG 
channel was performed as described previously** and neurotransmitter transporter 
assays were determined used the Molecular Devices Neurotransmitter Assay Kit 
(Molecular Devices). 


In vivo studies. Adult male C57BL/6J (aged 3-5 months) obtained from Jackson 
Laboratories (Bar Harbour, Maine) were used to investigate behavioural responses, 
respiratory effects, and hyperlocomotion induced by PZM21 and compared 
with morphine or vehicle (0.9% sodium chloride). For ,,OR knockout animals, 
Oprm1~/~ mice (B6.129S2-Oprm1tm1Kff/J) were obtained from Jackson 
Laboratories. All drugs were dissolved in vehicle and injected subcutaneously. 
Behavioural studies were conducted at the University of North Carolina and 
Stanford University following the National Institutes of Health’s guidelines for 
care and use of animals and with approved mouse protocols from the institu- 
tional animal care and use committees. Sample sizes (number of animals) were 
not predetermined by a statistical method and animals were assigned to groups 
randomly. Drug treatment groups were only blinded for measurement of affective 
versus reflexive analgesia; other experiments were not blinded to investigators. 
Predefined exclusion criteria were set for analgesia and conditioned preference 
experiments. No animals were excluded from statistical analysis. Statistical analyses 
were performed after first assessing the normality of distributions of data sets and 
Levens test was used to assess equality of variances. 

Measurement of analgesia. Analgesia-like responses in were measured as 
previously described** using a hotplate analgesia meter with dimensions of 
29.2 x 26.7 cm with mice restricted to a cylinder 8.9 cm in diameter and 15.2cm 
high (IITC Life Sciences, Woodland Hills, California). Response was measured 
by recording the latency to lick, flutter, or splay hind paw(s), or an attempt to 
jump out of the apparatus at 55°C, with a maximum cut-off time of 30s. Once a 
response was observed or the cut-off time had elapsed, the subject was immedi- 
ately removed from the hotplate and placed back in its home cage. The animals 
were acclimated to the hotplate, while cool, and a baseline analgesic response time 
was acquired several hours before drug treatment and testing. Mice were injected 
with either vehicle (n = 8), morphine (5 mg/kg, n=8 or 10 mg/kg, n=8), TRV130 
(1.2 mg/kg, n= 9) or PZM21 (10 mg/kg, n= 8; 20 mg/kg, n= 11; or 40 mg/kg, 
n= 8). After injection of drug, the analgesic effect expressed as percentage max- 
imum possible effect (%MPE) was measured at 15, 30, 60, 90 and 120 min after 
drug treatment. If animals did not display hind paw lick, splay, or flutter, they 
were removed from the trial. Additionally, if animals attempted to jump out of 
the plate or urinated on the hotplate they were removed from the trial. To assess 
analgesia by the tail-flick assay, a tail-flick analgesia meter (Columbus Instruments, 
Columbus, Ohio). Mice were gently immobilized with a cotton towel and the tail 
base was placed on a radiant light source emitting a constant temperature of 56 °C. 
The tail withdrawal latency was measured at similar time points as the hotplate 
assay after administration of vehicle (n= 8), morphine (5mg/kg, n= 4; 10 mg/kg, 
n= 8) or PZM21 (10 mg/kg, n= 8; 20 mg/kg; n= 14). The cut-off time for the 
heat source was set at 10s to avoid tissue damage. Analgesic response times were 
measured similar to the hotplate assay. 

Analgesia in OR knockout mice and subcategorization of affective/reflexive 
pain. Oprm1~'~ and wild-type C57BI/6J mice (male; 8-11 weeks) were acclimated 
to the testing environment and thermal-plate equipment for three non-consecutive 
days between 11:00 and 13:00 before any pharmacological studies. Acclimation was 
achieved by individually confining mice within an enclosed semi-transparent red 
plastic cylinder (10cm depth x 15cm height) ona raised metal-mesh rack (61cm 
height) for 30 min, and then exposing each mouse to the thermal-plate equipment 
(non-heated; floor dimensions, 16.5 x 16.5 cm; Bioseb), while confined within a 
clear plastic chamber (16cm length x 16cm width x 30cm height). Acclimation 
exposure to the thermal plate lasted for 30s, and exposure was repeated after 
30 min to mimic the test day conditions. The testing environment had an average 
ambient temperature of 22.6°C and illumination of 309 Ix from overhead fluores- 
cence lighting. The same male experimenter (G.C.) was present throughout the 
entire duration of habituation and testing to exclude possible olfaction-induced 
alterations in sensory thresholds®. 

Cutaneous application of a noxious stimulus, or time spent on a hotplate appara- 
tus can broadly elicit several distinct behavioural responses: 1) withdrawal reflexes: 
rapid reflexive retraction or digit splaying of the paw; 2) affective-motivational 
responses: directed licking and biting of the paw, and/or a motivational response 
characterized by jumping away from the heated floor plate. Paw withdrawal reflexes 
are classically measured in studies of hypersensitivity, and involve simple spinal 
cord and brainstem circuits*”. In contrast, affective responses are complex, non- 
stereotyped behaviours requiring processing by limbic and cortical circuits in the 
brain, the appearance of which indicates the subject’s motivation and arousal to 
make the unpleasant sensation cease by licking the affected tissue, or seeking an 
escape route***”-*, To distinguish between potential differential analgesic effects 
of PZM21, mice were placed on the heated apparatus (52.5°C), and the latency to 
exhibition of the first sign of a hindpaw reflexive withdraw, and the first sign of 
an affective response was recorded. A maximum exposure cut-off of 30s was set 
to reduce tissue damage. Mice were injected with either vehicle (n =6), morphine 
(10 mg/kg, n= 10), or PZM21 (20 mg/kg, n= 13). After injection of drug, the 
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analgesic effect on either reflex or attending responses was expressed as percentage 
maximum possible effect (%MPE), and was measured at —30 (baseline), 15, 30, 60, 
90, 120, and 180 min relative to drug treatment. For studies comparing Oprm1 a 
and wild-type C57Bl/6J mice, the analgesic response in the hotplate assay was 
measured 30 min after injection of vehicle (1 =5 for both genotypes), morphine 
(10 mg/kg, 1 =5 for both genotypes) or PZM21 (20 mg/kg, n=6 for Oprm1/~ and 
n=5 for wild-type). 

Formalin injection assay. Analgesia to formalin injection was carried out as 
described previously®*. Mice were first habituated for 20 min to the testing 
environment which included a home cage without bedding, food, and water. 
After habituation, vehicle (n =6), morphine (10 mg/kg, n= 7), or PZM21 (40 mg/ 
kg, n=7) was injected subcutaneously. This was followed by injection of 20 
of 1% formalin in 0.9% saline under the skin of the dorsal surface of the right 
hindpaw. Animals were returned to their home cage and behavioural responses 
were recorded for one hour. Nociception was estimated by measuring the cumula- 
tive time spent by animals licking the formalin-injected paw. As opioids classically 
display two phases of analgesic action, nociceptive behaviour was measured dur- 
ing both the early phase (0 to 5 min) and the late phase (20 to 30 min). In Fig. 4, 
an asterisk indicates a significant difference between drug and vehicle (P < 0.05 
calculated using a one-way ANOVA with Bonferroni correction). 

Mouse plethysmography. Respiration data was collected using a whole body 
plethysmography system (Buxco Electronics Inc., Wilmington, North Carolina) as 
described®. This method measures respiratory frequency, tidal volume, peak flows, 
inspiratory time, and expiratory time in conscious and unrestrained mice. Briefly, 
Buxco airflow transducers were attached to each plethysmography chamber and a 
constant flow rate was maintained for all chambers. Each chamber was calibrated to its 
attached transducer before the experiment. Animals were first habituated to the clear 
plexiglass chambers for 10 min. Respiratory parameters were recorded for 10 min 
to establish a baseline before injection of vehicle (n= 8), morphine (10 mg/kg, 
n= 8), TRV130 (1.2 mg/kg, n= 8) or PZM21 (40 mg/kg, n= 8). Respiratory 
parameters were then collected on unrestrained mice for 100 min post drug 
injection. To decrease respiratory variability induced by anxiety, mice were shielded 
from view of other animals and experimenter. In Fig. 4, an asterisk indicates a 
significant difference between drug and vehicle (P< 0.05 calculated using a 
repeated measures ANOVA with Bonferroni correction). 

Accumulated faecal boli quantification. To measure constipatory effects of 
morphine and PZM21, we assessed the total accumulated faecal boli as described’. 
Briefly, mice were injected with vehicle (n= 10), morphine (10 mg/kg, n= 16) or 
PZM21 (20 mg/kg, n= 16) and placed within a plexiglass chamber (5cm x 8cm 
x 8cm) positioned on a mesh screen. Mice were maintained without food or water 
for 6h. Faecal boli were collected underneath the mesh on a paper towel and the 
cumulative mass was measured every hour for six hours. In Fig. 4, an asterisk 
indicates a significant difference between drug and vehicle (P< 0.05 calculated 
using a repeated measures ANOVA with Bonferroni correction). 

Open field locomotor response. A photocell-equipped automated open field 
chamber (40cm x 40cm x 30cm; Versamax system, Accuscan Instruments) 
contained inside sound-attenuating boxes was used to assess locomotor activity. 
Baseline ambulation of freely moving mice was monitored over 30 min, followed 
by injection with vehicle (n=7), morphine (10 mg/kg, n=5) or PZM21 (20 mg/kg, 
n=6). Locomotor activity was monitored for another 150 min. In Fig. 4, an asterisk 
indicates a significant difference between drug and vehicle (P< 0.05 calculated 
using a repeated measures ANOVA with Bonferroni correction). 

Conditioned place preference. A three-chambered conditioned place preference 
apparatus (Med-Associates, St. Albans, Vermont) consisting of white or black 
chambers (16.8 x 12.7 x 12.7 cm each) with uniquely textured white mesh or black 
rod floors and separated by a neutral central chamber (7.2 x 12.7 x 12.7cm) was 
used for conditioned place preference testing. On day 1 (preconditioning day), 
mice were placed in the central chamber and allowed to explore freely for 30 min. 
Time spent in each compartment was used to estimate baseline chamber prefer- 
ences and mice showing specific chamber bias more than 70% were not studied 
further. On days 2-9 (conditioning days) mice were injected with either vehicle 
or drug and paired with either the white mesh or the black rod chambers. All 
mice received vehicle on days 2, 4, 6, 8 and drug on days 3, 5, 7, 9. On day 10 (test 
day), mice were again placed in the central chamber as on day 1 and allowed to 
explore freely for 30 min. Time spent in each chamber was expressed as percentage 
preference. Place preference was tested with morphine (10 mg/kg, n= 16), PZM21 
(20 mg/kg, n= 8), or TRV130 (1.2mg/kg, n =7). In Fig. 4, an asterisk indicates a 
significant difference between vehicle and drug chambers (P< 0.05 by one-sample 
t-test with hypothetical value of 50) while NS indicates non-significance (P > 0.05). 
Cataleptic effect. Drug induced catalepsy was measured in mice using the bar 
test®’, which includes a horizontally placed 3-mm diameter wooden bar fixed 4cm 
above the floor. Mice were habituated with the bar and the environment for 20 min 
before subcutaneous injection of either haloperidol (2 mg/kg, n= 8), morphine 
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(10 mg/kg, n= 8), or PZM21 (20 mg/kg, n= 8). To measure catalepsy, both 
forepaws were gently placed on the bar and the length of time during which each 
mouse remained in the initial position was measured. The effect was measured at 
15, 30 and 90 min after drug injection. Maximum cut-off time for each challenge 
was 90s. 

Pharmacokinetics of PZM21. Studies were performed by the Preclinical 
Therapeutics Core and the Drug Studies Unit at the University of California San 
Francisco. Ten mice were injected subcutaneously with 20 mg/kg of PZM21. 
At each time point, 1 ml of blood was collected from three mice and the serum 
concentration of PZM21 determined by liquid chromatography—mass 
spectrometry (LC/MS). Mice were subsequently sacrificed and entire brains were 
homogenized for determination of PZM21 concentrations by LC/MS. All studies 
were performed with approved mouse protocols from the institutional animal 
care and use committees. 

Metabolism of PZM21. Metabolism experiments were performed as described 
previously®. In brief, pooled microsomes from male mouse liver (CD-1) were 
purchased (Sigma Aldrich) and stored at —75°C until required. NADPH was 
purchased (Carl Roth) and stored at —8°C. The incubation reactions were 
carried out in polyethylene caps (Eppendorf, 1.5 ml) at 37°C. The incubation 
mixture contained PZM21 (80\1M) or positive controls (imipramine and 
rotigotine), pooled liver microsomes (0.5 mg of microsomal protein/ml of incu- 
bation mixture) and Tris-MgCl, buffer (48 mM Tris, 4.8mM MgCh, pH 7.4). 
The final incubation volume was 0.5 ml. Microsomal reactions were initiated by 
addition of 50 j1l of enzyme cofactor solution NADPH (final concentration of 
1mM). At 0, 15, 30 and 60 min the enzymatic reactions were terminated by addi- 
tion of 500 11 of ice-cold acetonitrile (containing 8 1M internal standard), and 
precipitated protein was removed by centrifugation (15,000 rcf for 3 min). The 
supernatant was analysed by HPLC/MS (binary solvent system, eluent acetonitrile 
in 0.1% aqueous formic acid, 10—40% acetonitrile in 8 min, 40—95% acetonitrile 
in 1 min, 95% acetonitrile for 1 min, flow rate of 0.3 ml/min). The experiments 
were repeated in three independent experiments. Parallel control incubations were 
conducted in the absence of cofactor solution to determine unspecific binding to 
matrix. Substrate remaining and metabolite formation was calculated as a mean 
value + s.e.m. of three independent experiments by comparing AUC of metabolites 
and substrate after predetermined incubation time to AUC of substrate at time 0, 
estimating a similar ionization rate, corrected by a factor calculated from the AUC 
of internal standard at each time point. 

Chemical synthesis. The stereochemically pure isomers of 12 and PZM21 
were synthesized from corresponding (R)- and (S)-amino acid amides, which 
were either commercially available or readily prepared from the corresponding 
acid or ester (see Supplementary Information). The primary amino group was 
dimethylated using an excess of aqueous formaldehyde and sodium triacetoxy- 
borohydride in aqueous acetonitrile. The carboxamides 16a,b were converted to 
primary amines by treatment with borane-tetrahydrofurane complex under reflux 
yielding the diamines 17a,b. Henry reaction of thiophene-3-carbaldehyde with 
nitroethane afforded the nitropropene derivative 18, which was converted into the 
racemic alkylamine 19. Activation with 4-nitrophenyl chloroformate yielded the 
carbamates 20, which were coupled with the enantiopure primary amines 17a,b 
to achieve diastereomeric mixtures of the corresponding ureas 12 and 21. HPLC 
separation using a semi-preparative Chiralpak AS-H column gave the overall eight 
pure stereoisomers of 12 and 21 including PZM21. 

To determine the absolute configuration of the final products and efficiently 
prepare PZM21, we synthesized enantiomerically enriched carbamate 20, 
coupled it with the corresponding primary amines. For enantiomeric enrichment, 
we performed chiral resolution of the racemic primary amine 19 via repetitive 
crystallization with di-p-anisoyl-(S)-tartaric acid. After triple crystallization, 
we obtained 19 enriched in dextrorotatory enantiomer ([a]p*° = +20.5°). The 
corresponding (R)-acetamide has been previously characterized as dextrorotatory 
([a]p7° = +-49.8°), so enantiomerically enriched 19 was treated with acetic anhy- 
dride and triethylamine, and the specific rotation of the product was measured. 
Based on the value of specific rotation of the resulting acetamide ([a]p! = —46.6°), 
we assigned the absolute configuration of the major isomer to be (S). (S)-enriched 
20 was used for synthesis of the final urea derivatives and absolute configura- 
tion of diastereomers in pairs was assigned based on the equality of retention 
time in chiral HPLC. A full description of the synthetic routes and analytical data 
of the compounds 12, PZM21 and its analogues PZM22-29 are presented in the 
Supplementary Information. 

Detailed modelling of PZM21 and TRV130 binding poses. PZM21 was docked 
to the inactive state j1OR structure using DOCK3.6 (ref. 21) as described for the 
primary screen, with the exception that the 45 matching spheres used were gen- 
erated based on the docked pose of compound 12. The resulting ligand-receptor 
complex was further optimized through minimization with the AMBER pro- 
tein force field® and the GAFF ligand force field supplemented with AM1-BCC 
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charges. Docking of PZM21 and TRV130 to the active state |,OR structure (PDB: 
5C1M) was also performed with DOCK3.6 with parameters as described above. 
The amino terminus of the active state OR, which forms a lid over the orthosteric 
binding site (residues Gly52-Met65) was removed before receptor preparation. 
Matching spheres were generated based on the pose of PZM21 in the inactive state. 
The resulting complexes were then minimized with AMBER. The pose of PZM21 
in the active state \1OR structure was further refined using Glide (Schrédinger) 
in XP mode. 

Molecular dynamics simulations were based on crystal structures of ,OR in the 

inactive- and active-state conformation (PDB: 4DKL and 5CM1., respectively). In 
both cases, all non-receptor residues (T4 lysozyme in the inactive state and Nb39 in 
the active state) were removed. For the active state, amino-terminal residues were 
removed as in the docking studies. Initial coordinates of PZM21 were generated 
by molecular docking as described above. The receptor was simulated with two 
tautomers of His297°™, either in the neutral Né or the Ne state. The }OR-PZM21 
complex was embedded in a lipid bilayer consisting of dioleoylphosphatidylcholine 
(DOPC) molecules as described previously*’. The charges of the inactive- and 
active-state simulation systems were neutralized by adding 11 and 14 chloride 
ions, respectively. To carry out MD simulations, the GROMACS package was 
used as described previously”. Briefly, the general AMBER force field (GAFF)’! 
was used for PZM21 and the lipids and the AMBER force field ff99SB” for the 
receptor. Parameters for PZM21 were assigned using antechamber, and charges 
were calculated using Gaussian09 (Gaussian, Inc.) at the HF/6-31(d,p) level and 
the RESP procedure according to the literature”*. During the simulations, PZM21 
was protonated at its tertiary amine and simulated as a cation. The SPC/E water 
model”* was used, and the simulations were carried out at 310K. Analysis of the 
trajectories was performed using GROMACS. Each simulation in a given condition 
was initiated from identical coordinates, but with initial atom velocities assigned 
independently and randomly. An overview of the simulation systems and their 
simulation times is shown in the Supplementary Information. 
Data analysis and reporting. Other than the in vivo studies, no statistical analysis 
was applied to in vitro or cell-based signalling assays. Sample size (number of 
assays for each compound or receptor) was predetermined to be in triplicate 
or quadruplicate for primary screening assays at a single concentration. For 
concentration-response assays, the sample size (number of assays for each 
compound at selected receptors) was also predetermined to be tested for a 
minimum of three assays, each in triplicate or quadruplicate. None of the 
functional assays were blinded to investigators. 
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Extended Data Figure 1 | Docking poses of active compounds. Seven of 23 experimentally tested compounds bound to the ,OR with micromolar 
affinity. Their docked poses often occupy sites not exploited by the antagonist (3-funaltrexamine. In each case, a canonical ionic interaction with 
D147? is observed. 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


ARTICLE 


a P 
G, (Glosensor) Docking 
1.0-+4 shinai c Inactive LOR (crystal) d Active LOR (crystal) 
Ee 4 —o— (S,S)-21 (PZM21) a PZM21 (docked) , PZM21 (docked, active) 
ag 08) —o- (S,A)-21 = NX a PZM21 (docked, inactive) 
o= . Z 
ies) —o=— (A.A 
zy f- T™4 wa3gectt 
£8 044 233922 K233°% 
ee] \ Ta T™M3 
geé ' 
ge" | i 2 ¢ na2728 
Le mC) 
0.0 ~ ef A V300°% past! 
1 00a. \ T™3 
120-11 -10 , a a 7 -6 5 é wsene iN 7s 
og [Drug % 
* \pan2 A - 
\ ze! Q124260 
b oO S H2976s ! H297°%% M1512 
fl I2gees" } i2g6e=" 132278 
) | ™2 
! T™M2 
corn Or) s 
HoH a 326744 3267 
_— 
A aNN | ava 4 w293%48 5 in 
G, (Glosensor) 
Cmpd | A a b pK; (nM) pEC,, — e Active »OR : f Active tOR 
PZM21 (docked, active) TRV130 
12 -H SR SR 7.37 + 0.06 6.75 + 0.06 0.88 + 0.03 < 
PZM21 (docked, active pose 2) T™s5 PZM21 
(S,S)-12 -H Ss Ss 8.32 + 0.03 7.19+0.10 0.85 + 0.04 Ria srr 
(SA)12} -H SR 7.34#0.04 6.63+0.27 0.38+0.06 BORK 
ECL2 ai, 233533 wiggee 
(R9-12] -H RS 6.59+0.03 628+018  0.68+0.09 SOY _ ies 
(R,R)-12 -H R R 7.05 + 0.03 5.51 +0.45 0.82 + 0.48 
(S,$)-21| -OH S Ss 8.97+0.03 8344007 0.76+0.02 ES a svaaaaat Na27283, 
- ‘i T4 = 1144329 
(S,A)-21| -OH_ S R 8.02+0.03 9924026 0.18+0.02 ieee H3197 —ygggsee “Xb 
(A,8-21}-OH RS 6.12+0.07 6.282014 0.63+0.07 KOR: | : ood ~b 
01 | : 
(A,A)-21|-OH RR 6.07+0.07 5.88+0.24 0.53+0.13 tee a, bse R e r feeares 
pia7x 
, \ TMw7 0! ) T Mata 
132273 = W3187%5 
: : TMs BOR} = SOR:L Wee Misty 
g Simulation KOR:| KOR: Y 1296851 13227 
noc:T = noc. L P Pe 
Active LOR (sim.) a a ya26rs 
PZM21 (sim.) w2eae 5 id 
ie PZM21 (docked, inactive) 
whe: Distance: Y148**° and PZM21 phenol Distance: N1272* and PZM21 thiophene 
16 20 
14 , 18 
L402 Inactive LOR < 16 Inactive pOR 
8 10 ee 
5 a Active pOR § 12 Active pOR 
= 10: 
N127253 a 6 B a 
14383 4 6 
at : : : . ‘ at r 1 r r 1 
ft) 200 400 600 800 1000 0 200 400 600 800 1000 
Simulation time (ns) Simulation time (ns) 
2.63 
14833 N127: 
fo) 
~~ Te ae Distance: Y326’*? and PZM21 urea NHb 
8 Inactive LOR 
Distance: H297°* and PZM21 phenol HO ee Active pOR 
waG 24280 
z worey TY Be HA 
8 NH : one 
5 i 
3% : 
= ; Inactive pOR cia mn i 32 0 200 400 600 800 1000 
24 = af - : . = Simulation time (ns) 
0 200 400 600 800 1000 
Simulation time (ns) 326742 
Distance: M1519° and PZM21 N-methyl w293°* 
10: Inactive LOR é : A 3.32 
_ Distance: D147°*2 and PZM21 amine NH Distance: D147?*? and PZM21 urea NHa 
8 i 8 
< Active pOR 8 Inactive ~OR 
e 6 =z @ = 6 Active pOR 
a @ Inactive ~OR g 
i" 12) 
a4 & 4 § 
@ Active nOR 2 
2t ; Saal + 1 a a 
0 200 400 600 00 1000 2 
Simulation time (ns) | : : - ; | : ; : : . 
0 200 400 600 800 1000 ) 200 400 600 800 1000 


Extended Data Figure 2 | See next page for caption. 
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Extended Data Figure 2 | Stereochemical structure-activity 
relationship. a, As with the different stereoisomers of 12, variation 

of the chiral centres in compound PZM21 results in large changes in 
efficacy and potency. Data are mean + s.e.m. of normalized results (n = 3 
measurements). b, Structure-activity relationship of compound 12 and 21 
stereoisomers with affinities displayed as pKi values and agonist potency 
and efficacy in a Gi;. Glosensor assay. c, d, PZM21 docked to active 

pOR shows a more extended conformation as compared to the inactive 
state. e, In the docked active state, the PZM21 thiophene extends into 
the specificity-determining region of opioid receptors. Key interacting 
residues here are highlighted as red lines and corresponding residues at 


ARTICLE 


the other human opioid receptors are indicated. f, Docked pose of TRV130 
within the OR site, showing minimal overlap in key pharmacophores 
with PZM21 besides the ionic interaction between the cationic amine and 
D147??. g, Molecular dynamics simulations of PZM21 in the inactive 
OR state (grey and black traces) leads to a stable conformation with 

the thiophene positioned >10 A away from N1277® (total of 21s of 
simulation time over three independent trajectories). In contrast, PZM21 
adopts a more extended pose when simulated with active LOR, with an 
average distance of 6 A between the thiophene and N127*. Other key 
interactions between ,tOR and PZM21 are also highlighted. 
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Extended Data Figure 3 | Structure activity relationship defined PZM21. In each case, the ECs value for PZM22 is shown in black 

by PZM21 analogues. Eight analogues were synthesized to probe the (1.8nM) and the ECso for the analogue is coloured. The covalent 

binding orientation of PZM21 and their efficacy as agonists was tested in compound PZM29 binds to the jOR:N127C variant irreversibly, as 

a CAMYEL-based Gi,, signalling assay. Analogues were compared to a evidenced by wash-resistant inhibition of radioligand binding. Signalling 
parent reference compound (PZM22) with similar efficacy and potency to data are mean +s.e.m. of normalized results (n = 3 measurements). 
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Extended Data Figure 4 | Signalling properties of PZM21 at the opioid 
receptors. Displayed are raw luminescence data from a G;;, Glosensor 
assay. In agonist mode, agonists decrease luminescence while inverse 
agonists increase it by diminishing basal signalling. For each opioid 
receptor, a prototypical well-characterized agonist (black curves) and 
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mode, a competition reaction is performed with 50 nM agonist and an 
escalating amount of tested drug. Here, true antagonists increase the 
observed signal, consistent with their ability to compete with the agonist 
but not induce G; signalling. Data are mean + s.e.m. of non-normalized 
results (1 = 3 measurements). 
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Extended Data Figure 5 | PZM21 is selective for ;OR. a, Compound 
PZM21 was screened against 320 non-olfactory GPCRs for agonism in 
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tested in non-arrestin signalling assays. c, PZM21 does not show a 
dose-dependent change in cAMP inhibition in a Gi;. Glosensor assay 
measuring SSTR4 activation, indicating that the single elevated point 
in b is probably a false positive result. d, e, Inhibition assays of hERG 
(d) and the dopamine transporter (DAT), norepinephrine transporter 


(NET), and serotonin transporter (SERT) (e) show that PZM21 has weak 
inhibitory activity ranging from 2-34 1M at these targets. For a, data are 
mean +s.e.m. of non-normalized results (n = 4 measurements). For b-e, 
data are mean +s.e.m. of normalized results (n = 3-6 measurements). 


increase in signal twofold over background were further tested in full 
dose-response mode. Several potential targets (GPR110, MCHRIR, 
PTGERI) did not show dose-dependent increase in signal and probably 
represent screening false positives. CKCR7 and SSTR4 did show dose- 
dependent signals at high concentrations of PZM21, and were further 
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Extended Data Figure 6 | See next page for caption. 
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Extended Data Figure 6 | Signalling profile of PZM21 and other »OR 
agonists. a, PZM21 is an efficacious G, and G, agonist in a GTP‘S assay. 
b, Like other OR agonists, PZM21 induces a dose-dependent decrease in 
cAMP levels that is sensitive to pertussis toxin, confirming Gj. mediated 
signalling. c, Herkinorin is a G;,, agonist and robustly recruits arrestin in 
a BRET assay performed in the absence of GRK2 overexpression. TRV130 
or PZM21 show undetectable levels of arrestin recruitment in the same 
experiement. d, PZM21 and other opioids show no activity in a calcium- 
release assay, indicating no Gg-mediated second messenger signalling. 
The positive control TFLLR-NH)j efficiently activates the G, coupled 
receptor PAR-1. e, PZM21 and TRV130 induce much decreased receptor 
internalization versus DAMGO and even morphine. f, Herkinorin and 
TRV 130 are potent agonists of the KOR. PZM21 is a KOR antagonist 

(see Extended Data Fig. 4). g, In HEK293 cells, GRK2 expression levels 
have minimal effect on the potency and efficacy of the unbiased agonist 
DAMGO in a Gi, activation assay. Increased GRK2 levels change the basal 


cAMP signal due to increased desentization of |,OR, which lowers receptor 
basal activity and leads to elevated isoproterenol-induced cAMP. In an 
arrestin-recruitment BRET assay, increased GRK2 expression increases 
both the potency and maximal efficacy of the unbiased agonist DAMGO. 
This is likely because GRK2 mediated phosphorylation is required for 
efficient 6-arrestin recruitment. h, G; activation and arrestin recruitment 
in cells co-expressing 1.0 j1g/15 cm? of GRK2. Notably, PZM21 induces 

a higher maximal level of arrestin recruitment as compared to U2OS 
cells, which express very low levels of GRK2, but this level is significantly 
lower than morphine. Despite the lower efficacy for arrestin recruitment 
observed for morphine, TRV130 and PZM21 compared to DAMGO, 

a formal calculation of bias by the operational models fails to show that 
this effect is significant. i, Table of pECso values and Emax values for 
various signalling assays. All data are mean s.e.m. of results (n = 2-6 
measurements). 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


ARTICLE 


Catalepsy: bar test 


—o— PZM21 
1004 —O— Morphine 
_| —©— Haloperidol 


Catalepsy (seconds) 
- 
ir 


—_ 
———d 


0 T T 1 
15 30 90 
Time (minutes) 


Metabolite pool signaling 


—o— PZM21 
—o— +60 min. MLM 


Metabolite 4 (0.5%) 


Inhibition of CAMP accumulation 
(normalized to PZM 21) 


7 Hot-plate (affective) Hot-plate (reflex) 
1007 —o— Vehicle 1007 —o— Vehicle 
—o— PZM21 (20 mg/kg) —o— PZM21 (20 mg/kg) 
my 807 —o— PZM21 (20 mg/kg) in Oprm1*- cq 8°] —o— PZM21 (20 mg/kg) in Oprmt* 
a 
= = 60 
x BS 
8 :: 40-4 
D 2 
ie © 20] 
< <t 
rr ee! 
0 15 30 60 90 120 180 0 15 30 60 90 120 180 
Time (minutes) Time (minutes) 
b 
—o-— Vehicle —o— Vehicle 
a —o— Morphine (10 mg/kg) _ —O— Morphine (10 mg/kg) 
—oO— Morphine (10 mg/kg) in Oprm1*- 1007 [= Morphine (10 mg/kg) in Oprm1+ 
iy a * 
a a 
= = 604 
o is] 
gs SB go 
3 3 
2 D 
o © 204 
< < 
0 
0 15 30 60 90 120 180 0 15 30 60 90 120 180 
Time (minutes) Time (minutes) 
d ee e . : : 
PZM21 Pharmacokinetics Phase | metabolism - mouse liver microsomes 
100003 
—o—PZM21 
“--o- Plasma (ng/ml) —o— Rotigotine 
e- aeool 6, —o— Brain (ng/g) —o— Imipramine 
=o % ~ 100 
gS g em 
85 g 2 
55 1004 < S 
SE € 60 « 
a> o £ 
st = 2 
N | @ 40 40 
Q 10 2 @ 
a i 
v7 5 
a 20 g 
1 . : : ° of ; ; ® 
0 30 120 240 0 15 30 60 


Time (minutes) Time (minutes) 


Extended Data Figure 7 | Additional in vivo studies of PZM21. 

a, Analgesic responses measured in the hotplate assay were 
subcategorized into either affective or reflexive behaviours and scored 
separately. b, Morphine (n = 10 animals) induces changes in both 
behaviours, while PZM21 (n= 13 animals) only modulates the attending 
(affective) component. Knockout of the |,OR ablates all analgesic 
responses by morphine and PZM21. c, PZM21 shows minimal cataleptic 
effect compared to morphine at different time points. The effect of 
haloperidol was included as a positive control. d, Pharmacokinetic studies 
of PZM21 (n =3-4 animals for each time point) show central nervous 
system penetration of the compound, with a peak level of 197 ng 

of PZM21 per g of brain tissue. With a concomitant serum concentration 
of 1,253 ng/ml, this represents a serum:brain concentration ratio of 6.4. 
These levels are similar to those achieved by morphine, which shows a 
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peak brain concentration of approximately 300 ng/g and a serum:brain 
concentration ratio of 3.7 30 min after subcutaneous injection”>. 

e, Metabolism of PZM21 over 60 min exposure to mouse liver microsomes. 
Rotigotine and imipramine serve as positive controls for extensive phase 

I metabolism. The total amount of PZM21 and metabolite pool is slightly 
greater than 100% (101.8%) reflecting cumulative error in LC/MS analysis. 
f, A Gio signalling assay shows that none of the metabolites are measurably 
more potent activators of the }t.OR versus PZM21 alone. The metabolite 
pool after the 60-min incubation was used directly in the signalling assay. 
As a negative control, the pooled material from a reaction carried out in 
the absence of the key cofactor NADPH was used in the signalling assay. 
All data are mean + s.e.m. For e, reactions were run in triplicate and the 
s.e.m. was calculated from individual measurements of each reaction. 
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Extended Data Table 1 | Molecules with ,OR activity identified in the initial screen 
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Extended Data Table 2 | Analogues tested at the ;OR 
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*The ECFP4 Tanimoto similarity (T,) to the most similar ,OR ligand in ChREMBL16. 
®No measurable activity. 
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Extended Data Table 3 | Binding and signalling properties of compounds 12 and PZM21 


12 PZM21 
K;(nM) 
wOR 42 iF 
5OR N.A. 506 
KOR 464 18 
nociceptin N.D.? N.D.? 
Gio (Glosensor) 
ECso (nM) | Emax (%) 
wOR 180 | 88 4.6| 77 
50R N.AL 1900 | 78 
KOR NAL N.A 
nociceptin 1400 | 43 N.AL 
Arrestin recruitment (PathHunter) 
ECso (nM) | Emax (7%) 
wOR 940 | 9.4 N.A.? 


*Not determined. 
®No measurable activity. 
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Structure of the voltage-gated calcium 
channel Ca,1.1 at 3.6.A resolution 


Jianping Wu)?**, Zhen Yan!*, Zhanggiang Li>?3*, Xingyang Qian®, Shan Lu’, Mengqiu Dong’, Qiang Zhou!” & Nieng Yan! 


The voltage-gated calcium (Cay) channels convert membrane electrical signals to intracellular Ca**+-mediated events. 
Among the ten subtypes of Cay channel in mammals, Cay1.1 is specified for the excitation-contraction coupling of 
skeletal muscles. Here we present the cryo- electron microscopy structure of the rabbit Cay1.1 complex at a nominal 
resolution of 3.6 A. The inner gate of the ion-conducting «1-subunit is closed and all four voltage-sensing domains adopt 
an ‘up’ conformation, suggesting a potentially inactivated state. The extended extracellular loops of the pore domain, 
which are stabilized by multiple disulfide bonds, form a windowed dome above the selectivity filter. One side of the 
dome provides the docking site for the «25-1-subunit, while the other side may attract cations through its negative 
surface potential. The intracellular I-II and III-IV linker helices interact with the 8),-subunit and the carboxy-terminal 
domain of al, respectively. Classification of the particles yielded two additional reconstructions that reveal pronounced 
displacement of 3), and adjacent elements in a1. The atomic model of the Cay1.1 complex establishes a foundation for 
mechanistic understanding of excitation- contraction coupling and provides a three- dimensional template for molecular 
interpretations of the functions and disease mechanisms of Ca, and Na, channels. 


Voltage-gated calcium (Ca,) channels activate upon changes in mem- 
brane potential and mediate Ca** entry, which triggers multitudes 
of Ca**-dependent cellular events. Malfunction or dysregulation of 
Ca, channels are associated with various neurological, cardiovascular, 
and muscular disorders, making Ca, channels major targets for drug 
development! ®. 

In mammals, ten Ca, subtypes have been identified and classified into 
three families, Cay1, Cay2, and Ca,3, on the basis of the ion conduct- 
ing al-subunit**. Among these, the Cay1 channels (Cay1.1-1.4) are 
L-type high-voltage-activated channels that co-assemble with auxiliary 
subunits including the extracellular 026, the intracellular 8, and the 
transmembrane 4. These auxiliary subunits modulate the activation 
and inactivation kinetics, gating properties, and membrane trafficking 
of the al-subunit’~’. Cay1 channels, known as DHP receptors, are 
responsive to dihydropyridine (DHP) drugs'™!!. 

Cay1.1, being the first Cayal to be cloned, has been a prototype 
in the functional, structural, and mechanistic investigations of Ca, 
channels!*!3. Cayl.1 serves as the voltage sensor for excitation- 
contraction coupling of skeletal muscles'*!°. The action-potential- 
induced conformational changes of Cay1.1 activate the type 1 ryanodine 
receptor (RyR1), which is responsible for rapid release of Ca* from the 
sarcoplasmic reticulum to cytoplasm, an event that triggers subsequent 
muscle contraction!®!8, 

The cryo-electron microscopy (cryo-EM) structure of the rabbit 
Ca,1.1 complex containing a1-, «26-1-, 8),-, and ~-subunits was deter- 
mined to 4.2 A resolution’. The four homologous repeats (I-IV) of 
al, each containing six transmembrane segments (S1-S6) organized 
into a canonical voltage-gated ion channel fold, are arranged clockwise 
in the extracellular view. The +-subunit, which contains four trans- 
membrane helices and shares identical structural fold with claudins”°, 
contacts the voltage-sensing domain of the fourth repeat (VSDyy). The 
von Willebrand factor domain A (VWA) and two cache domains in 025 


interact with the extended extracellular loops of a1. The cytoplasmic 
B1,-subunit was placed in the vicinity of VSDy. However, owing to the 
moderate resolution, side chains were assigned to merely one-quarter 
of the molecular mass. Half of the EM map for the extracellular «26- 
subunit could not be resolved. In particular, the VSDs in repeats II and 
III were nearly invisible and the molecular details of the pore domain 
are yet to be elucidated. 


Structural determination of the Ca,1.1 complex 

Carbon-film-coated grids were used to enrich particles for EM imaging 
that yielded the 4.2 A reconstruction”’. To increase contrast, we deter- 
mined an optimal condition in which sufficient numbers of protein 
particles entered ice without carbon film. The new condition resulted 
in elimination of orientation preference and resolution of all four VSDs 
(Extended Data Fig. la—c). Out of 527,833 selected particles (class I), 
an EM map was calculated to 3.6 A according to the gold-standard 
Fourier shell correlation 0.143 criterion (Extended Data Figs 1d and 2). 

The resolution for the majority of the a26- and a1l-subunits was 
beyond 3.5 A and allowed de novo model building. However, the den- 
sity for the 8-subunit was still largely missing. After further 3D classifi- 
cation, a 3.9 A map was obtained from a subgroup of particles (class Ia) 
that showed a discernible 8-subunit. Meanwhile, a separate group 
of particles (class II) yielded another 3.9 A map. In addition to the 
B-subunit, several intracellular segments of the «1-subunit were well 
resolved in the class Ia and class II reconstructions (Extended Data 
Figs 1d-f and 2). 

On the basis of the three EM maps, a structural model consisting 
of 2,661 residues, among which 2,595 had side groups assigned, was 
generated for the Ca,1.1 complex. In addition to the protein subunits, 
14 lipids were built and 25 sugar moieties were assigned to 16 glyco- 
sylation sites, 15 on the a26-subunit and 1 on the al-subunit (Fig. 1 
and Extended Data Table 1). The structural assignment was verified by 


IState Key Laboratory of Membrane Biology, School of Life Sciences and School of Medicine, Tsinghua University, Beijing 100084, China. @Beijing Advanced Innovation Center for Structural 
Biology, School of Life Sciences and School of Medicine, Tsinghua University, Beijing 100084, China. 2Tsinghua-Peking Joint Center for Life Sciences, School of Life Sciences, Tsinghua University, 
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Figure 1 | Overall structure of the rabbit Ca,1.1 complex. The structure 
shown here was primarily modelled and refined with the 3.6 A class EM 
map. The intracellular segments were modelled on the basis of the class Ia 
map and VSDjy was built on the class I and class II maps. See Extended 
Data Figs 1 and 2 for details of cryo-EM analysis. The structure is colour- 
coded for distinct subunits. The four homologous repeats (repeats I-IV) 
of the al-subunit are coloured with increasingly darker green. The same 
colour scheme is used in Figs 2-6 unless otherwise indicated. The glycosyl 
moieties and lipids are shown as black sticks. All structure figures were 
prepared with PyMol**. 


mass spectrometric (MS) analysis of crosslinked complex and disulfide 
bonds (Extended Data Table 2). 


Structure of the a25-subunit 

In the new 3.6 A map, two additional cache domains were resolved in 
the a26-subunit (Fig. 2a and Extended Data Figs 3 and 4). Therefore, 
the a26-subunit in total comprises four tandem cache domains and 
one VWA domain. Despite distinct domain organization in the three- 
dimensional space, the five domains are intertwined in the primary 
sequence (Extended Data Fig. 4a). 
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The 6-subunit, which is separated from the a2-subunit in the pri- 
mary structure through proteolytic cleavage, completes the fourth cache 
domain by contributing three B-strands (Fig. 2a and Extended Data 
Fig. 4a, b). The ensuing segment of the $-subunit displays an extended 
conformation, wrapping alongside the concave side of cachel and 
cache2 and the top surface of the VWA domain before reaching the 
extracellular loops of the a1-subunit. The last unambiguously assigned 
residue of the 6-subunit in the EM map is Cys1074, which forms a 
disulfide bond with Cys406 in the VWA domain (Fig. 2b and Extended 
Data Fig. 4b). The 6-subunit was predicted to possess a single transmem- 
brane helix formed by the carboxy (C)-terminal hydrophobic sequence 
(Fig. 2b). A recent characterization suggested that the 6-subunit 
may be anchored to the membrane through a glycophosphatidylinositol 
modification”!, which appears to be supported by the structure and MS 
characterizations. We modelled an ethanolamine to the density follow- 
ing Cys1074 (Fig. 2b). See the legend of Extended Data Fig. 4b for details. 

The extended conformation of 6 is stabilized through multiple intra- 
and inter-subunit disulfide bonds. In total, four disulfide bonds were 
observed between the «2- and 6-subunits and two within the 6-subunit 
(Fig. 2c and Extended Data Table 2a). On the surface of the a26-subunit, 
15 out of 16 predicted glycosyl moieties were identified (Fig. 2c). 

The VWA domain exhibited an ‘open’ conformation in the 
4.2 A structure despite the lack of density for a metal ion in the 
metal-ion-dependent adhesion site (MIDAS) motif'’. In the current 
map, the MIDAS residues, Ser263, Ser265, Asp261, Thr333, and 
Asp365, coordinate a density that should correspond to a cation. As 
the protein was purified in the presence of 10mM Ca’*, we tentatively 
assigned a Ca’* into the map (Fig. 2d and Extended Data Fig. 4e). In 
addition to the MIDAS residues, Asp78 of «1, which is located on the 
L1-2; loop, also contributes to ion coordination, providing a structural 
interpretation for the MIDAS-dependent augmentation of cell surface 
density of the Ca, channels”. 


The closed channel 

On the basis of the three maps, an atomic model of the «.1-subunit was 
generated including the pore domain, the four VSDs, the intracellular 
ol-interating domain (AID)-containing helix between repeats I and II 


Figure 2 | Structure of the extracellular «26-1- 
subunit. a, The «25-1-subunit comprises one 
VWA domain and four tandem cache domains. 
The extracellular region of the a1-subunit is 
shown in semi-transparent surface view. The 
6-subunit is coloured orange and the a2-subunit 
is domain coloured. A topological cartoon of 
the a26-1-subunit is presented in Extended 
Data Fig. 4a. b, The structure appears to support 
glycophosphatidylinositol modification of 

the cleavage-exposed C terminus (Cys1074) 

of the 6-subunit?!. The primary sequence of 

the C terminus of the 5 subunit is shown. The 
red arrow indicates the potential cleavage site. 
The 5c EM map that extends beyond Cys1074 
may correspond to the ethanolamine of the 
glycophosphatidylinositol. c, The glycosylation 
sites (black sticks) and disulfide bonds (spheres) 
identified in the cryo-EM structure of the 
a26-1-subunit. Mass spectrometric analysis 

is summarized in Extended Data Table 2. 

d, A metal ion coordinated by the MIDAS motif 
in the VWA domain of «2. The bound ion is 
shown as purple sphere. 
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Figure 3 | The ion permeation path of Cay1.1. a, The overall structure of 
the al-subunit. The CTD is omitted in the right panel to better illustrate 
the III-IV linker. The tentatively assigned Ca** ions in the selectivity 
filter vestibule are presented as green spheres. b, The permeation path of 
the pore domain. The ion conducting passage, calculated by HOLE”, is 
illustrated by brown dots in the middle panel. The pore radii along the 
pore are tabulated on the left. The cut-open extracellular views of the 
electrostatic potentials calculated in PyMol are shown for the indicated 
layers. c, The extracellular loops of the pore domain are stabilized by 
multiple disulfide bonds. d, The extracellular loops of the pore domain 
form a domed window above the selectivity filter. While the extended 


(the I-II helix), and the intact II-IV linker. The invisible segments 
include 40 residues following the AID in linker I-II, 100 residues 
between repeats II and II, and the C-terminal residues after Asp1515 
(Fig. 3a, Extended Data Fig. 5 and Supplementary Fig. 1). The assign- 
ment of all the side groups of the pore domain allows close examination 
of the channel permeation path (Fig. 3b). 

The extended extracellular loops, which are stabilized by multiple 
intra-loop disulfide bonds, form a windowed dome above the selectivity 
filter (Fig. 3c and Extended Data Fig. 6a, b). The window formed by the 
L5 loops from repeats I, II, and III is enriched in negatively charged res- 
idues, representing a potential main entrance in Ca** to the selectivity 
filter vestibule, while the one encircled by L5yq and L6ry may also allow 
passage of Ca”* ions (Fig. 3d and Extended Data Fig. 6c). On the other 
side, loops L1-2), L5y and L5y together constitute the docking site for 
the a26-subunit (Extended Data Fig. 6d). 

Despite the fact that the carboxylate groups of Glu and Asp are 
invisible in the EM map owing to radiation damage, the high-quality 
map of the P1 and P2 helices allows accurate backbone assignment of 
the residues constituting the selectivity filter vestibule, including the 
critical EEEE residues (Glu292/614/1014/1323) that provide the side 
groups and the two preceding residues in each repeat that contribute 
the carbonyl oxygens (C=O) (Fig. 3e and Extended Data Fig. 5d). A 
consecutive stretch of density stands along the selectivity filter vestibule 
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loops between S5 and P1 helices in repeats II and III (designated the L5y, 
and L5y1 loops, respectively) provide the primary docking site for the 
VWA domain in «2, a cavity enclosed by the negative charges on L5;, L5y,, 
and L5; may represent the major extracellular entrance for Ca?* ions. 

e, The selectivity filter of Cay1.1. A stretch of EM density alongside the 
selectivity filter vestibule can be deconvoluted to a sphere at the inner site 
and a flattened disc in the middle. Two Ca** ions are tentatively assigned. 
See Extended Data Figs 5d and 7 for detailed analysis. f, Closed inner gate. 
Four hydrophobic residues, Val329, Phe656, Phe1060, and Phe1376, on 
the S6 bundle seal the inner gate. The closure is buttressed by additional 
hydrophobic residues, particularly those on S6; and S6y11. 


that can be deconvoluted to a round disc in the centre of the four Glu 
residues and a sphere surrounded by the eight C=O groups. We tenta- 
tively assigned two Ca?" ions to the middle disc and the inner sphere 
(Fig. 3e and Extended Data Fig. 7a). The structural assignment of the 
selectivity filter and Ca?* ions supports previous characterizations of 
the residues critical for ion selectivity>*>. 

We used 10mM and 0.5mM Ca?* ions for protein purifications that 
yielded the present and previous EM reconstructions, respectively. 
The shape and position of the density in the selectivity filter vestibule in 
the current map, even when low-pass filtered to 4.2 A, are distinct from 
those in the previous map!” (Extended Data Fig. 7a-c). The heights 
of the two Ca*” ions in the current structure are similar to those in 
Ca,Ab”®, except that the inner one is slightly off the central axis and 
closer to repeats I and II (Extended Data Fig. 7d). 

Below the selectivity filter vestibule is the typical hydrophobic cavity 
with side portals that are penetrated by transverse lipids, a feature 
observed in bacterial Nay channels?”?8 (Fig. 3b and Extended Data 
Fig. 6a, b). The asymmetric S6 bundle of Ca,1.1 screws tightly at the 
inner gate. Three aromatic residues at the corresponding positions on 
S6 from repeats II-IV (Phe656/1060/1376) together with Val329 on 
S6; completely seal the pore from the cytoplasm. Below the aromatic 
ring are hydrophobic residues on S6; and S6y that buttress the closure 
(Fig. 3f). 
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Figure 4 | The VSDs of Ca,1.1. a, The four VSDs exhibit similar but 
non-identical conformations. When the four VSDs are superimposed, the 
ensuing S4—5 connecting helices deviate from each other. b, Structural 
comparison of VSDs between Cayl.1, NayRh, Na,Ab, and the Nay1.7 
VSD4-NayAb chimaera. For visual clarity, only VSDy of Cay1.1 is shown. 
The VSDs of NayRh, Na,Ab and the chimaera (PDB accession numbers 
4DXW, 3RWO, and 5EKO, respectively) are coloured pale cyan, wheat, 
and yellow, respectively. c, Sequence alignment of the $4 segments from 
the rabbit Ca,1.1 and human Ca, channels. The gating charge residues 

are shaded yellow. The positive residues that are one residue shifted 

from the positions of gating charges are shaded cyan. All the gating 
charges are labelled R1-R6 despite the presence of both Arg and Lys. See 
Supplementary Fig. 1 for the complete sequence alignment. d, Comparison 
of the four S4 segments in Ca,1.1. The gating charge residues and the one 
amino acid shifted positively charged residues are shown as thicker and 
thinner sticks, respectively. e, Structures of the four VSDs in Cayl.1. The 
gating charges on S4 and the An1, Anz, and the occluding Phe on S2 are 
shown as sticks. Asp500 on $3y and Asp1186 on S3;y are also shown. 


The VSDs of Cay1.1 

All the VSD segments, except S3 and $4 in VSD, were elucidated in 
the 3.6A map. The class II map exhibits a better resolved VSDyy. The 
structure of VSD, was thus generated on the basis of class I and II maps 
with the $3; segment (residues 862-885) built as poly-Ala (Extended 
Data Fig. 5a). 

The four VSDs have similar but non-identical structural features 
(Fig. 4a). In contrast to NayAb and Na,Rh, where the $3 segments are 
largely unfolded on the extracellular side, the Cay1.1-S3 segments are 
full transmembrane helices (Fig. 4b). In fact, the structures of Cay1.1 
VSDs are similar to that of the Nay1.7 VSD4-Na,Ab chimaera?’, with 
a root mean squared deviation of 1.6 A over 109 Ca atoms between 
the chimaera-VSD and Ca,y1.1-VSDy, implying structural similarities 
between the eukaryotic Ca, and Nay channels (Fig. 4b). 

Comparison of rabbit Ca,1.1 with the ten human Ca, channels 
reveals up to six gating charge residues on each $4 segment*”?! (Fig. 4c 
and Supplementary Fig. 1). For simplicity, we label the corresponding 
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Figure 5 | Interactions between the a1- and +-subunits. a, The 
--subunit interacts with VSDjy of the a1-subunit. An extracellular view 
and a side view of the «1- and y-subunits are shown. b, The structure of 
the y-subunit (residues 1-216). c, The interface between the y-subunit 
and VSDry is mediated by van der Waals contacts in the transmembrane 
region. A large number of hydrophobic residues in the $3 and S4 segments 
of a1-VSDyy and TM2 and TM3 of + constitute the transmembrane 
interface between the two subunits. d, Polar interactions between the 

C terminus of the y-subunit and the II-IV linker and the $4-5);y segment 
of the al-subunit. The potential hydrogen bonds are represented by red 
dashed lines. 


residues in each $4 segment R1-R6 despite the presence of both Arg 
and Lys. The rabbit Ca,1.1 has RI-R5 on S4;, R2-R6 on S4y, R1-R5 on 
S4i and R2—R5 on S4ty (Fig. Ac, d). 

All the gating charges are aligned on one side of the 319 helix of S4 
in all four VSDs, a state similar to Na,Ab and its chimaera with Na,1.7, 
but different from Na,Rh?”~”? (Fig. 4d, e). The R5 residues and R6y are 
below, whereas the R1-R4 residues are above the conserved occluding 
Phe in the charge transfer centre*’, representing the depolarized or ‘up’ 
conformation of VSDs (Fig. 4e). The negative residue on S3 that con- 
stitutes the charge transfer centre is found in VSDy and VSDyy, but not 
the other two VSDs. The map quality of VSDy does not support reliable 
analysis of detailed interactions; otherwise, R3 and R4 are coordinated 
by the An1 residue, and R5 interacts with An2 on the S2 segments 
in all the VSDs (Fig. 4e and Extended Data Fig. 5a). Considering the 
closed pore and the ‘up’ VSDs, the structure of Cay1.1 shown here may 
represent a potentially inactivated state. 


Interactions between the a1- and +-subunits 

In addition to the four transmembrane helices in the \-subunit resolved 
in the previous 4.2 A structure!®, the new map further elucidates an 
extracellular 8 sheet and the cytosolic amino (N)- and C-terminal loops 
of the --subunit and reveals detailed interactions with a1 (Fig. 5). The 
transmembrane interface between a1 and +, mediated by transmem- 
brane helices 2 and 3 (TM2 and TM3) in yand $3 and S4 in al-VSDyy, 
is entirely constituted by hydrophobic residues, which are unlikely to 
provide the specificity between 1 and VSDjy (Fig. 5c). 

On the intracellular side, the C-terminal loop of + is located between 
the III-IV linker and the $4—5yy helix of «1. Several polar or charged 
residues on the [I-IV linker and the S4-5yy helix of a1 may form 
hydrogen bonds with the C-terminal residues of , which may con- 
fer the specificity for the association of y to the fourth, but not the 
other VSDs (Fig. 5d). The direct contacts between y-TM2 and al-S4jy 
may affect the conformational changes of the latter segment during 
voltage-dependent activation or inactivation, thereby providing the 
molecular basis for the antagonistic and other modulation effects of 
-on the channel properties”? 
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Figure 6 | Conformational changes of the intracellular elements of 
Cay1.1. a, The intracellular I-II linker of «1- and the 3-subunits resolved 
in class Ia and class II maps. The maps were generated in Chimera”’. 

b, The pronounced shifts of 8 and the intracellular segments of «1 between 
the two 3.9 A structures. The orange arrows indicate the movements of the 
intracellular domains from class Ia to class II. c, Conformational changes 


Distinct conformations of the intracellular domains 

The secondary structural elements of the 8-subunit are resolved in the 
class Ia and II maps, supporting reliable docking of the crystal structure 
of AID-bound 3-subunit**. The AID motif is part of a seemingly rigid 
helix that represents the intracellular extension of S6; bent by approxi- 
mately 60° and lies along one diagonal axis of the ol -subunit on the intra- 
cellular surface (Fig. 6a and Extended Data Fig. 8a). Residues 670-686 
constituting the extension of the S6y helix are resolved in the class II, 
but not class Ia, map (Fig. 6b, c and Extended Data Fig. 8b). The S6q 
extension is adjacent to the }-subunit, consistent with the MS analysis 
of crosslinked Cay1.1 complex (Fig. 6b and Extended Data Table 2b). 

The AID motif is sandwiched between «1-VSDy and 8. This struc- 
tural feature predicts that conformational changes of S6; or VSDy 
would be translated to the motion of 3-subunit through the I-II linker 
helix. Supporting this notion, comparison of the class Ia and II recon- 
structions reveals shifts of the C terminus of S6; and the ensuing I-II 
helix. Meanwhile, the 3-subunit undergoes a pronounced displacement 
between the two reconstructions (Fig. 6b, c, Extended Data Fig. 8b and 
Supplementary Video). 

The intact III-IV linker, which is the shortest among the three 
inter-repeat linkers of a1, is resolved in all three maps. Residues 
1,083-1,094, which connect to the S6y; segment through a turn, forma 
helix that together with the helices in carboxy-terminal domain (CTD) 
completes a globular domain (Figs 3a, 6a and Extended Data Fig. 8c). 
The succeeding linker residues 1,095-1,105 interact with the C terminus 
of the \-subunit (Fig. 5d). 


Perspective 
Cay1.1 and RyR1 are the principal membrane components for excitation- 
contraction coupling. Structural determination of the two Ca” channels 
illuminates the avenue towards mechanistic understanding of this 
fundamental physiological process!”?*-3”, However, the complex 
formation between Ca,l.1 and RyR1 has not been recapitulated 
biochemically in vitro, probably owing to weak affinities between indi- 
vidual proteins or dissociation of bridging components during protein 
isolation. The concerted motions of the «1-segments and the 6-subunit 
revealed in this study provide important clues to the molecular under- 
standing for excitation-contraction coupling as the 3-subunit is an 
indispensable component for the formation of the ultrastructure 
between Ca,1.1 and RyR1 (refs 38-40). The advent of atomic struc- 
tures of the Cay1.1 complex and RyR1 establishes a solid foundation 
for future investigations of excitation-contraction coupling using 
biophysical and biochemical approaches such as electron tomography 
and super-resolution imaging. 

The structural elucidation of the proximity between the 6),-subunit 
and the extended S6y; segment and VSDy of a1, as well as the observation 
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of the intracellular segments of «1. To highlight the differences between 
class Ia and class II structures, the S6;, AID, and S6q segments are coloured 
cyan and yellow, respectively, while other segments are coloured grey in 
the top panel. Additional residues up to Met686 were resolved in class II 
maps for S61. See the Supplementary Video for the morph illustrating the 
conformational changes. 


that the 6-subunit-interacting AID is part of the bent extension of S6y, 
provide an important clue to understanding the coupling mechanism 
between the 3-subunit and the S6 segments in the voltage-dependent 
inactivation of Ca, channels*!. Despite the fact that the conforma- 
tional states of the class Ia and II structures remain to be defined, the 
observed conformational changes of the intracellular segments in the 
a1-subunit and the concordant movement of the 3-subunit lay out 
an important foundation for future investigations (Supplementary 
Video). 

The Ca, channels and closely related Na, channels play a major role 
in multitudes of physiological and pathological processes. Hundreds of 
disease-associated mutations have been identified in these channels. The 
atomic model of the Cay1.1 complex presented here provides the struc- 
tural template for mechanistic interpretation of a large body of exper- 
imental and clinical observations concerning Cay and Nay channels 
(Supplementary Figs 1 and 2). 

One unexpected structural finding is the formation of a globular 
domain by the CTD and the HI-IV linker helix of the al-subunit. 
Consistent with the sequence similarities between Cay and Na, channels, 
structural comparison of the Ca,1.1-CTD with the CTDs of Nay1.2 
and Na,1.5 reveals identical fold of the first four helices and inter- 
vening B-strands (Supplementary Fig. 2)***3. However, marked 
conformational deviations occur to the last two helices. The a5 and 
a6 helices in Cayl.1-CTD form a hairpin that interacts with the 
III-IV helix. In contrast, the «5 helix in the Nay-CTDs is almost the 
equivalent of the III-IV helix in Cay1.1, and the IQ motif-containing 
a6 helix, which is substantially longer than that in Cay1.1-CTD, 
adopts an extended conformation and interacts with Ca?*-loaded 
calmodulin in the crystal structures (Supplementary Fig. 2, inset). 
The differences between the Ca,1.1-CTD in the context of the full- 
length protein and the isolated Na,-CTD in complex with calmodu- 
lin may suggest potential conformational changes of the Ca,-CTDs 
upon binding to calmodulin in different Ca** loading states. Further 
investigation of this caveat may shed light on the mechanistic under- 
standing of calcium-dependent inactivation of Ca, channels***. 
Finally, the structural similarity between the IJI-IV linker in the 
intact Ca,l.1 channel and an isolated Nay channel inactivation gate 
will facilitate mechanistic understanding of fast inactivation of Nay 
channels (Supplementary Fig. 2)*°”. 

In sum, the EM structure of the Ca,1.1 complex at near-atomic 
resolution presented here serves as the template for homologous 
modelling and structure-based engineering of related Ca, and Na, chan- 
nels, establishes the framework for computational and experimental 
characterizations and interpretations of the function and disease 
mechanism of these channels, and provides the basis for structure- 
guided ligand design. 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized. The investigators were not blinded to allocation during 
experiments and outcome assessment. 

Purification of the rabbit Ca,1.1 complex. Purification of the Cay1.1 complex 
from rabbit skeletal muscle was performed essentially as described!” except for two 
major variations. First, the calcium concentration was increased from 0.5 mM to 
10 mM. Second, the protein solution was applied to grids without size-exclusion 
chromatography purification to achieve high concentration. The protein was 
first concentrated by Amicon Ultra-15 Centrifugal Filter Unit with Ultracel-100 
membrane (Merck Millipore) after elution from Glutathione Sepharose 4B resin 
(GE Healthcare). When the volume reached approximately 2 ml, the solution 
was transferred to an Amicon Ultra-4 Centrifugal Filter Unit with Ultracel-100 
membrane (Merck Millipore) for further concentration. The final protein volume 
was about 50 1] at a concentration of approximately 2 mg/ml. The protein was 
mixed with 18 mM pregabalin (Adamas) for 30 min on ice before cryo-sample 
preparation. 

Mass spectrometric analysis. Crosslinking of protein coupled with mass spec- 
trometry analysis was performed as described’. For the disulfide bond analysis, 
purified Cay1.1 complex was precipitated with 20% TCA, washed with cold acetone 
twice and then dissolved at 0.5 1g/j1L in 8 M urea, 2mM N-ethylmaleimide, 0.1 M 
Tris, pH 6.5. Following Lys-C digestion at a 1:100 enzyme:substrate ratio for 4h 
at 37°C, the sample was diluted to 2 M urea with 0.1 M Tris, pH 6.5 for further 
digestion with trypsin (1:20, 12h, 37°C), elastase (1:40, 12h, 37°C), or trypsin 
(1:20, 12h, 37°C) followed by Glu-C (1:20, 10h, 25°C). One aliquot of sample 
was directly diluted to 2 M urea and digested with proteinase K (1:20, 3h, 37°C). 
PNGase F was added 2h before the digestion was quenched with 5% formic acid 
to remove N-linked glycans off Cay1.1 peptides. 

LC-MS/MS analysis was performed using a Q-Exactive mass spectrometer 
equipped with an Easy Nano-LC 1000 liquid chromatography system. Digested 
peptides were loaded onto a 751m x 6cm pre-column that was packed with 
10jm, 120A ODS-AQ C18 resin (YMC, Kyoto, Japan) and connected to a 
75\1m x 10cm analytical column packed with 1.8,.m, 120 A UHPLC-XB-C18 resin 
(Welch Materials, Shanghai, China). The peptides were separated over a 77-min linear 
gradient from 3% buffer B (100% acetonitrile, 0.1% formic acid), 97% buffer A 
(0.1% formic acid) to 27% buffer B, followed by a 6-min gradient from 27% to 80% 
buffer B, then maintained at 80% buffer B for 10 min. The flow rate was 200 nl/min. 
The MS parameters were R= 140,000 in full scan, R= 17,500 in higher-energy 
collisional dissociation MS2 scan; the 30 most intense ions in each full scan were 
selected for higher-energy collisional dissociation; the AGC targets were 10° for 
Fourier transform mass spectrometry full scan and 5 x 10* for MS2; minimal signal 
threshold for MS2 was 4 x 104; precursors having a charge state of +1, higher 
than +6 or unassigned were excluded; normalized collision energy was set to 27; 
peptide match preferred. 

The raw data were pre-processed using pParse”!, and co-eluting precursor ions 
were excluded. To identify disulfide-linked peptides, the MS data were searched 
using pLink-SS* against a protein database containing the sequences of the five 
subunits of Cay1.1 and all the proteases used. The pLink search parameters were 
as follows: three missed cleavage sites for trypsin; eight missed cleavage sites for 
Trypsin/Glu-C; for elastase and proteinase K, no specificity; peptide length 4-25 
amino acids; fixed modification of — 1.007285 Da on cysteine and the disulfide 
mass was set to zero; deamidation on asparagine was set as variable modification. 
The search results were filtered by requiring <10 ppm mass deviation between an 
observed precursor mass and the monoisotopic mass of the matched candidate, 
E<0.001, false discovery rate < 0.05. The spectra of the disulfide-bonded peptides 
thus obtained were labelled and verified manually. 

Sample preparation and cryo-EM data acquisition. Aliquots of 4,11 purified 
Cay1.1 complex at a concentration of approximately 2 mg/ml were placed on 
glow-discharged holey carbon grids (Quantifoil Cu R1.2/1.3, 300 mesh). Grids 
were blotted for 2s and flash-frozen in liquid ethane cooled by liquid nitrogen 
using Vitrobot Mark IV (FEI). Grids were transferred to a Titan Krios (FEI) elec- 
tron microscope that was operating at 300 kV with a nominal magnification of 
22,500. Images were recorded manually using a K2 Summit electron counting 
direct detection camera (Gatan) in super-resolution mode and binned to a pixel 
size of 1.32 A. Defocus values varied from 1.3 to 2.9,1m. Each image was acquired 
at an exposure time of 8s and dose-fractionated to 32 frames with a dose rate of 
about 8.2 counts (or ~10.9 electrons) per second per physical pixel. UCSFImage4 
was used for all data collection®. 

Image processing. A simplified diagram of the procedure for image processing is 
presented in Extended Data Fig. 2a. A total of 9,704 cryo-EM micrographs were 
collected. The images were aligned and summed using the whole-image motion 
correction™. The estimation of the contrast transfer function parameters were 
determined by CTFFIND3*°. Templates for reference-based particle picking were 


ARTICLE 


obtained from the 2D class average calculated from ~3,000 manually picked par- 
ticles. A total of 1,630,272 particles were picked by RELION 1.4 (ref. 56) using 
low-pass filtered templates to 20 A to limit reference bias. Subsequent 2D and 3D 
classifications and refinements were performed using RELION1.4. 

Two rounds of reference-free 2D classification were performed to further 
remove ice spots, contaminants, and aggregates, yielding 1,222,388 particles. 
The particles were directly subjected to an auto-refine procedure, resulting in a 
4.1A map. The initial model used during auto-refinement was from the 4.2 A 
map (Electron Microscopy Data Bank accession number EMD-6475), which was 
adjusted to the same box size by e2proc3d.py application in EMAN™ and low-pass 
filtered to 40 A in RELION. The 4.1 A map shows qualitatively improved density 
with amino-acid side chains clearly visualized in the central region compared 
with reported 4.2 A map. With the refined particles as input, per-particle motion 
correction and radiation-damage weighing (particle polishing) was performed. 
The polished particles were subjected to auto-refine procedure and resulted in a 
reconstruction with an improved overall resolution of 3.9 A. 

A 3D classification into eight classes was performed using the polished par- 
ticles. Local angular searches around the refined orientations were used with an 
angular sampling of 1.8°. Among the eight classes, two classes were suboptimal 
and discarded. Five classes showed similar overall features and were combined for 
auto-refinement, yielding a 3.8 A map (class I). The remaining one class showed 
characteristic features: discernible density for the ‘tail’ region (3-subunit), intra- 
cellular loops including the AID, the extended S61, and a much better resolved 
VSDyy. This class was refined separately and resulted in a reconstruction with the 
overall resolution of 4.3 A (class II). The particles from class I were subjected to 
one additional round of 3D classification. Auto-refinement for subgroups of the 
classes did not result in improvement of the overall resolution. Nevertheless, one 
class that showed both clear ‘tail’ and AID was selected for auto-refinement, which 
resulted in a map with an overall resolution of 4.3 A (class Ia). 

To further eliminate heterogeneous particles, we developed a method named 
‘random-phase 3D classification™. Briefly, the particles were classified against two 
references, among which the second reference was the same as the first one but 
phase-randomized above a specified spatial frequency in each iteration. The data 
set was 3D classified for several cycles with sufficient iterations in each cycle. The 
spatial frequency above which the second reference was phase-randomized was 
1/40 A7!, 1/20 A~, 1/15A~1, 1/12 A~! and 1/10 A! for each cycle, respectively. 
After each cycle, the particles prone to be classified into the phase-randomized 
class were removed before the next cycle. The remaining particles after several 
cycles of random-phase 3D classification were considered as ‘good’ particles and 
subjected to routine 3D auto-refinement with RELION 1.4. The random-phase 
3D classification method was implemented with home-modified RELION 1.4. 
Eventually, the resolutions of the three classes (I, Ia, and II) were improved to 
3.57 A, 3.94A, and 3.94A, respectively. 

Reported resolutions are based on the gold-standard Fourier shell correlation 

0.143 criterion. All density maps were corrected for the modulation transfer func- 
tion of the detector and sharpened by applying a negative B-factor that was esti- 
mated using automated procedures”. Local resolution variations were estimated 
using ResMap™. 
Model building and refinement. The 3.6 A map (class I) was used for the major- 
ity of model building, while the two 3.9 A maps (class Ia and class II) were used 
for model building of VSDy; and the intracellular domains, as well as analysis of 
conformational changes. 

The previously reported structure (PDB accession number 3JBR)"” was used as 
the starting model. De novo building was performed for «1-, «25- and \-subunits 
in COOT*. Sequence assignment was guided mainly by bulky residues such 
as Phe, Tyr, Trp, and Arg. The chemical properties of amino acids were con- 
sidered to facilitate model building. The densities for glycosylation sites and 
mass spectrometric analysis of the crosslinking results and disulfide bonds were 
used for model confirmation. Then the modelling of each subunit was separately 
performed with RosettaCM using the manually built model as template and the 
experimental cryo-EM maps as guide". This process helped to optimize the 
model and to build some missing parts in the structure. For this step, ten models 
were generated in Rosetta for each subunit and the best one was selected by com- 
parison of the models with the map. The selected models for each-subunit were 
then merged together and further manually adjusted in COOT. As the density 
for ‘tail’ region is weak in the class I map, we used the class Ia map for identifi- 
cation and docking of (3-subunit. The structure of 82-subunit in complex with 
Cay1.2 I-II linker (PDB accession number 4DEY**) was docked into the class 
Ia map by COOT, and fitted into the density by CHIMERA”. The distinguish- 
able secondary structures in the map of ‘tail’ and the discernible AID ensured 
reliable docking. We also generated model of the «1-subunit and docked the 
6-subunit into class II map. The model building procedure was similar to the 
afore-mentioned process. For VSDy, the backbones were first built in class II 


ys 
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map, and subsequently transferred to class I map for side-chain assignment of 
S1-2y and S4;y segments. 

In total, we built 2,661 residues with 2,595 assigned side chains for the overall 
complex. The CTD and Sy; segment of the «1-subunit were built as poly Ala. The 
intracellular II-III linker and the C-terminal sequences following CTD in al anda 
small fragment of -\-subunit were not modelled owing to the lack of corresponding 
densities in the maps. 

Structure refinement was performed using phenix.real_space_refine application 
in PHENIX® in real space with secondary structure and geometry restraints to 
prevent structure over-fitting. The final model was refined against the overall 3.6 A 
map cryo-EM map using REFMAC® in reciprocal space, using secondary structure 
restraints that were generated by ProSMART®. Overfitting of the overall model 
was monitored by refining the model in one of the two independent maps from 
the gold-standard refinement approach and testing the refined model against the 
other map® (Extended Data Fig. 1b). Statistics of 3D reconstruction and model 
refinement can be found in Extended Data Table 1. 
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Extended Data Figure 1 | Cryo-EM analysis of the rabbit Ca,1.1 3.6 A map that it was refined against (black); of the model refined in the 
complex. a, A representative electron micrograph of the Cay1.1 complex. first of the two independent maps used for the gold-standard Fourier shell 
Scale bar, 400 A. b, Two-dimensional class averages of the electron correlation curves versus that same map (red); and of the model refined 
micrographs. The box size and circle diameter are 264 A and 220A, in the first of the two independent maps versus the second independent 
respectively. c, Angular distribution for the final reconstruction of the map (green). The small difference between the red and green curves 
Ca,1.1 complex. Each cylinder represents one view and the height of the indicates that the refinement of the atomic coordinates did not suffer from 
cylinder is proportional to the number of particles for that view. overfitting. f, The overall EM maps of the Cay1.1 complex are colour- 
d, The gold-standard Fourier shell correlation curves for the EM maps. coded to indicate the range of resolutions. See Extended Data Fig. 2 for the 
See Extended Data Fig. 2 and Methods for details of the three classes. definition of classes I, Ia, and II. The resolution maps are calculated with 
e, Fourier shell correlation curves of the refined model versus the overall ResMap™. 
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Extended Data Figure 2 | Flowchart for cryo-EM data processing of the Cay1.1 complex. See ‘Image processing’ in Methods for details of (a) the 
flowchart and (b) the random-phase 3D classification method®. 
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Previous EM map at 4.2 A 


Superimposition of the two 
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Extended Data Figure 3 | The new map reconstructed at 3.6 A exhibits qualitative improvement over the reported 4.2 A map. The same four 
perpendicular side views are shown for the published 4.2 A map (a), the new 3.6 A map presented here (b), and their superimposition (c). The EM maps 
were generated in Chimera. 
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Extended Data Figure 4 | Topology and EM maps of the «26-1-subunit. _ correspond to the C-terminal sequences of the 5-subunit. We thereby 


a, Topology of the «25-subunit. The domains are coloured the same as assigned an ethanolamine to the density following Cys1074. c, EM maps 
the domain structures shown in Fig. 2. b, EM map for the overall «26- of representative 3-strands in the «2-subunit. Left: the EM maps for 
1-subunit. The EM map for the C-terminal stretch of the 6-subunit is the B-sheets in the VWA (upper) and cachel (bottom) domains. Right: 
shown at the bottom and coloured orange. As seen in the EM map, the the EM maps for representative 3-strands in the a2-subunit. NAG, 
consecutive density of the 5-subunit extends slightly beyond Cys1074. N-acetylglucosamine. d, EM maps of representative a-helices in the 

The additional density would correspond to Gly1075 if the C terminus a2-subunit. e, The EM maps for the MIDAS motif in the VWA domain 
were not cleaved during maturation, or alternatively, the ethanolamine and the loop between S1 and S2 in the first VSD of «1 (designated the 

of glycophosphatidylinositol (GPI) that modifies Cys1074. No peptide L1-2; loop). The density corresponding to the cation is coloured magenta. 
was detected for sequences after Cys1074 in the MS analysis of the The maps were generated using class I reconstruction and contoured 
purified Ca,1.1 complex, and no additional density was found that may at 6-80 in PyMol. 
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Extended Data Figure 5 | Representative EM maps for segments in the as exemplified in the right panel. A consecutive stretch of density was 
al-subunit. The EM maps for the $2 and S4 segments in the four VSDs (a), _ observed along the selectivity filter vestibule. e, The densities that may 
the S5 (b), and S6 (c) segments in the pore domain are shown. d, The correspond to lipid molecules bound to the a1-subunit. The maps were 
EM map of the selectivity filter and the supporting P1 and P2 helices. made using class I reconstruction and contoured at 4-60 in PyMol. 


Side-chain assignment was assisted by bulky residues in P1 and P2 helices 
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Extended Data Figure 6 | Structural features of the pore domain of the 
al-subunit. a, The structure of the pore domain in four perpendicular 
side views. The disulfide bonds and glycosyl moieties are shown as sticks. 
Left: the densities below selectivity filter that may correspond to lipid 
tails are shown at 50. The modelled lipids are shown as yellow sticks. 
Similar densities that penetrate the side portals of the central cavity of the 
pore domain were previously observed in the structures of NayAb”’ and 
Na,Rh*®. b, The negative surface potential of the extracellular loops and 
the fenestrations of the pore domain. The surface electrostatic potential 


Lo ~f 1 
2s) Cache 


was calculated in PyMol. The fenestrations in the transmembrane region 
are highlighted by orange circles. c, The potential extracellular Ca** 
entrances through the windowed dome of the «1-subunit. Two potential 
entrances for Ca”* are contoured with yellow lines in the top panels. The 
residues that underlie the negative surface potentials are shown in the 
bottom panels. d, The interface between the «26-1- and a1-subunits. The 
L1-2), L5y, and L5;y loops form the docking site for the VWA and cachel 
domains of the «26-1-subunit. 
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Extended Data Figure 7 | Putative Ca”* coordination in the selectivity 
filter vestibule. a—c, The densities that may correspond to bound Ca?+ 
ions in the selectivity filter vestibule exhibit distinct features in the present 
and previously published maps’’. The Ca** concentrations for the samples 
that yielded the new 3.6 A and previous 4.2 A maps are 10 and 0.5mM, 
respectively. The maps were contoured at 50. Even when low-pass filtered 
to 4.2 A, the density in the selectivity filter vestibule remains a stretch 
instead of a sphere. Nevertheless, it remains to be investigated whether 


Cav1.1/NavRh 
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Cav1.1/TPC1 


the stretch of the density observed in the selectivity filter vestibule indeed 
corresponds to two Ca’* ions. d, Comparison of Ca** coordination by 
different Ca, and Na, channels. The structure of Cay1.1 is superimposed 
with CayAb”®, NayRh*’, and TPC1®”° relative to their respective selectivity 
filters. The PDB accession numbers for CayAb, NayRh, and TPC1 are 
4MS2, 4DXW, and 5E1J, respectively. The tentatively assigned Ca** ions in 
Ca,1.1 are coloured green and those in the other three indicated channels 
are coloured wheat. 
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Extended Data Figure 8 | Conformational changes of the intracellular also drift to different degrees. For visual clarity, the maps are low-pass 
domains. a, The 3.9 A map calculated from class Ia particles is almost filtered to 6 A. See the Supplementary Video for the morph illustrating 
identical to the 3.6 A map (class I) except for the better resolution of the conformational changes of the intracellular elements of the Cay1.1 
the B-subunit and the AID motif of «1. b, Distinct conformations of complex. c, The intracellular III-IV linker of a1 is well resolved. Shown 
the intracellular domains in class Ia and II reconstructions. The shifts here are class Ia and class II EM maps generated in Chimera. The III-IV 
of the $-subunit and the AID motif from class Ia to class II maps are linker of «1 forms a short helix that interacts with the carboxyl terminal 
indicated by orange arrows. The adjacent segments in the pore domain domain (CTD) of the al-subunit. 
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Extended Data Table 1 | Statistics of 3D reconstructions and model refinement 


Data collection 
EM equipment 
Voltage (kV) 
Detector 
Pixel size (A) 
Electron dose (e/A’) 
Defocus range (tum) 
Reconstruction 
Software 
Maps 
Number of used Particles 
Symmetry 
Final Resolution (A) 
Map sharpening, B-factor (A’) 
Accuracy of rotation (°) 
Accuracy of translation (pixels) 
Model building 
Software 
Refinement 
Software 
Average Fourier shell correlation 
R-factor 
Model composition 
Protein residues 
Side chains 
lon (Ca’') 
Sugar 
Lipid 
Validation 
R.m.s deviations 
Bonds length (A) 
Bonds Angle (°) 
Ramachandran plot statistics (%) 
Preferred 
Allowed 
Outlier 


FEI Titan Krios 
300 
Gatan K2 
1.32 
50 
1.3~2.9 


RELION 1.4 
Class I 


Class Ia 


Class II 


527,833 113,165 123,274 
Cl 
3.57 3.94 3.94 


-157.5 -146.7 -142.3 


1.444 1.638 1331 
0.704 0.779 0.760 


Coot & Rosetta 


Phenix & Refimac 
0.859 
0.306 


2,661 
2,999 
3 
2 
14 


0.013 
1.360 


86.7 
11.0 
bud 
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Extended Data Table 2 | Disulfide bonds and chemically crosslinked lysine pairs identified from the Cay1.1 complex 


Site1- Site2 Proteinase K Lys-C/Elastase Lys-C/Trypsin Lys-C/Trypsin/Glu-C 
02 (907)- 8 (977) 3.23E-04 (4,7) 1.21E-16 (5,34) 3.66E-23 (1,77) 4.78E-20 (2,32) 
a1 (957)- «1 (968) 7.40E-10 (1,3) 9.30E-14 (3,14) 9.34E-14 (1,6) 
a2 (356)- 8 (1062) 4.31E-06 (1,3) 
2 (305)- 8 (1047) 6.84E-08 (3,5) 3.35E-19 (20,156) 9,39E-18 (1,10) 8.34E-18 (4,20) 
a2 (305)- & (1027,1029,1047)* 7.09E-08 (2,6) 
8 (999, 1002) 1.67E-05 (3,4) 7.76E-05 (1,2) 5.47E-18 (1,13) 9.06E-12 (1,4) 
8 (1027,1029) 1.43E-06 (2,3) 2.38E-14 (1,2) 1.64E-08 (1,5) 7.56E-07 (1,4) 
8 (999,1002)- 8 (1027,1029) 4.69E-09 (19,37) 3.19E-07 (1,15) 5.75E-09 (1,9) 
2 (670)- «2 (700) 6.13E-10 (25,65) 1.30E-18 (3,57) 
02 (837,842,844,853) 3.52E-06 (4,8) 
2 (837,842,844)- a2 (853)* 6.14E-09 (4,30) 1.02E-09 (2,10) 4.81E-08 (4,17) 
2 (844,853) 7.64E-12 (2,3) 
ot (1214,1219) 6.00E-13 (2,2) 
(57) (80) 1.99E-07 (2,13) 
b 
Site1- Site2 #Spec-Total Best E-value Site1- Site2 #Spec-Total Best E-value Site1- Site2 #Spec-Total Best E-value 
O12 (234)- o1 (976) 8 2.37E-14 2 (90)- c12 (97) 25 6.50E-11 1 (1094)- a1 (1504) 14 6.66E-07 
a1 (356)- B (395) 47 3.39E-11 2 (148)- «2 (693) 3 7.31E-11 1 (1446)- a1 (1476) 6 6.95E-07 
B (151)- o11 (693) 19 6.57E-11 B (95)- B (99) 17 1.15E-10 11 (1245)- 011 (1478) 4 1.19E-06 
ort (352)- 8 (402) 15 1.02E-09 ctl (356)- 1 (677) 31 4.67E-10 012 (336)- «2 (377) 13 2.21E-06 
1 (352)- B (395) 18 3.42E-09 B (322)- B (331) 18 5.08E-10 a1 (1446)- a1 (1538) 4 2.89E-06 
¥ (1)- 1 (1094) 8 2.96E-07 ol (704)- «1 (719) 9 2.12E-09 a1 (1535)- a1 (1550) 10 3.89E-06 
B (395)- 1 (693) 4 2.46E-06 11 (698)- «11 (704) 44 4.28E-09 1 (677)- a1 (698) 3 5.12E-06 
B (83)- 01 (352) 3 2.48E-06 onl (1414)- 01 (1418) 19 6.79E-09 1 (1496)- 011 (1550) 4 7.60E-06 
B (83)- ox1 (356) 4 3.94E-06 011 (693)- «1 (708) 13 6.98E-09 B (83)- B (139) 3 8.93E-06 
B (151)- a1 (685) 4 1.34E-05 11 (356)- 1 (685) 3) 4.39E-08 1 (703)- a1 (708) 3 1.27E-05 
B (139)- a1 (1535) 3 1.80E-05 1 (677)- 1 (693) 13 1.47E-08 1 (1478)- a1 (1496) 20 1.69E-05 
fi (402)- a1 (1414) 3 3.67E-05 11 (685)- «1 (693) 14 4.71E-08 1 (693)- a1 (719) vf 1.86E-05 
1 (162)- B (402) 3 5.07E-05 B (83)- B (322) 6 1.74E-08 a1 (352)- «1 (719) 2 1.94E-05 
011 (356)- fs (402) 11 8.13E-05 ol (1478)- ox (1550) 2 1.81E-08 cu (14)- 01 (18) 3 1.98E-05 
B (402)- a1 (677) 8 1.27E-04 B (83)- B (99) 19 2.19E-08 ctl (1538)- 011 (1549) 5 2.56E-05 
¥ (1)- 4 (1414) 3 2.66E-04 B (395)- B (402) 27 2.26E-08 B (402)- B (406) 8 3.12E-05 
Cc ott (677)- 1 (719) 4 2.61E-08 11 (698)- o1 (708) 6 3.34E-05 
7 Site1-Site2_#Spec-Total Best E-value 12 (97)- a2 (108) 35 4.65E-08 011 (698)- 01 (719) 4 3.59E-05 
72 (652)-0:2(693)11~=S«s:«. 1-28 11 (704)- 1 (710) 4 5.37E-08 1 (1538)- 011 (1550) 2 4.63E-05 
«22 (637)- «2 (693) 40 2. 86E-18 B (83)- B (97) 1 6.25E-08 1 (1418)- «1 (1535) 2 5.92E-05 
B (99}- B (139) 59 1.06E-15 ott (352)- «1 (677) 19 7.29E-08 012 (727)- 02 (733) 3 7.29E-05 
012 (137)- 02 (148) 12 4.08E-15 a1 (703)- a1 (719) 5 1.77E-07 012 (336)- 02 (380) 2 1.14E-04 
B (139)- B (331) 76 5.53E-15 ot (677)- a1 (704) 3 2.60E-07 1 (162)- «1 (1478) 2 1.44E-04 
012 (234)- 0:2 (554) 6 1.53E-13 cul (708)- 1 (719) 4 3.09E-07 ol (1094)- 011 (1499) 4 2.02E-04 
02 (137)- «2 (234) 7 2.85E-13 11 (693)- a1 (703) 6 3.14E-07 11 (1535)- 011 (1549) 4 2.11E-04 
oA (693)- 01 (704) 48 4.50E-13 CANCERS P CANE) SHEE O12 (108)- 012 (442) 2 2.76E-04 
ot (1083)- 1 (1094) 11 6.40E-13 02 (97)- 0.2 (197) 8 4.10E-07 02 (148)- c12 (579) 6 2.84E-04 
B (99)- B (331) 25 2.10E-12 11 (693)- 1 (710) 8 4.62E-07 11 (352)- «1 (693) 2 3.01E-04 
6 (83)- B (95) 25 2.13E-12 ot (677)- 1 (685) 10 5.31E-07 1 (356)- 011 (1414) 2 3.46E-04 
2 (97)- «12 (442) 4 4.64E-11 1 (703)- 1 (710) 10 5.54E-07 1 (345)- a1 (352) 7 7.12E-04 


a, Disulfide bonds of Ca,1.1 were identified using pLink-SS after the complex was digested with different proteases and subjected to LC-MS/MS analysis. In the structure, four disulfide bonds are 
observed between a2- and 6-subunits (Cys305-Cys1047, Cys356-Cys1062, Cys406-Cys1074, Cys907-Cys977), and two adjacent disulfide bonds are found within the 6-subunit (Cys999-Cys1029, 
Cys1002-Cys1027). Shown in the table are the best E values (number of peptide pairs, number of spectra). The underlined Cys residues form a loop-linked disulfide bond. b, c, The Cay1.1 complex was 
crosslinked with DSS (disuccinimidyl suberate) and digested with trypsin. Following LC-MS/MS analysis of the peptides, crosslinked lysine pairs between (b) or within (c) subunits were identified using 
pLink. The filtering criteria were the same for a-c: identified spectra passing a false discovery rate cutoff of 0.05 were further filtered by requiring E value <0.001 and spectral counts >2. The disulfide 
bonds and crosslinked lysine pairs that match the cryo-EM structure are highlighted in red. 
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Cryo-EM structure of the spliceosome 
immediately after branching 


Wojciech P. Galej!, Max E. Wilkinson!, Sebastian M. Fica!, Chris Oubridge!, Andrew J. Newman! & Kiyoshi Nagai! 


Precursor mRNA (pre-mRNA) splicing proceeds by two consecutive transesterification reactions via a lariat-intron 
intermediate. Here we present the 3.8 A cryo-electron microscopy structure of the spliceosome immediately after lariat 
formation. The 5/-splice site is cleaved but remains close to the catalytic Mg?" site in the U2/U6 small nuclear RNA (snRNA) 
triplex, and the 5’-phosphate of the intron nucleotide G(+1) is linked to the branch adenosine 2/OH. The 5’-exon is held 
between the Prp8 amino-terminal and linker domains, and base-pairs with U5 snRNA loop 1. Non-Watson-Crick interactions 
between the branch helix and 5’-splice site dock the branch adenosine into the active site, while intron nucleotides +3 to 
+6 base-pair with the U6 snRNA ACAGAGA sequence. Isyl and the step-one factors Yju2 and Cwe25 stabilize docking of 
the branch helix. The intron downstream of the branch site emerges between the Prp8 reverse transcriptase and linker 
domains and extends towards the Prp16 helicase, suggesting a plausible mechanism of remodelling before exon ligation. 


The spliceosome is a dynamic molecular machine’” that catalyses 
pre-mRNA splicing in two sequential transesterifications analogous 
to group II intron self-splicing’. The major spliceosomal components— 
U1, U2, U4/U6, and U5 small nuclear ribonucleoprotein particles 
(snRNPs), and the two large NineTeen and NineTeen Related (NTC and 
NTR) protein complexes—assemble de novo on pre-mRNA substrates 
in an ordered manner*©. Initially U1 and U2 snRNPs recognize 
the 5/-splice site (5’SS) and branch point sequences of pre-mRNA; 
subsequently the pre-assembled U4/U6.U5 tri-snRNP is recruited to 
form the fully assembled spliceosome (complex B). During catalytic 
activation Prp28 helicase displaces the 5’SS from U1 snRNP and 
allows it to base-pair with the U6 snRNA ACAGAGA sequence”®. 
Brr2 helicase unwinds the U4/U6 snRNA duplex to release U4 snRNA 
and its associated proteins”, allowing recruitment of the NTC and 
NTR complexes. The resulting complex B**' is then remodelled to 
complex B*, which recruits step-one-specific factors Yju2 and Cwc25. 
These factors stabilize a network of RNA interactions comprising U2, 
U5 and U6 snRNAs, which position the pre-mRNA 5/SS and branch 
point sequences for catalysis of the first transesterification (branching) 
producing 5’-exon and lariat intron—3’-exon intermediates. The result- 
ing complex C is further remodelled to complex C* in which the 5’- and 
3’-exons are aligned on U5 snRNA loop 1 to produce spliced mRNA 
and lariat intron products via the second transesterification (exon 
ligation)''"*. The spliced mRNA is released and the remaining intron 
lariat spliceosome (ILS) is disassembled, recycling the snRNPs for new 
rounds of splicing. 

During this splicing cycle DExD/H-box helicases are recruited to 
the spliceosome at specific steps to remodel RNA-RNA interactions 
and induce binding or release of auxiliary factors'*'*. Specifically, 
after branching, the step-one factors Yju2 and Cwc25 are released by 
Prp16 helicase and Prp18-Slu7 and Prp22 are recruited to produce 
catalytically active complex C*(ref. 13). Following exon ligation, the 
spliced mRNA is released by Prp22 helicase!® and the residual ILS is 
disassembled by Prp43 helicase!®"7. 

Here we describe the cryo-electron microscopy (cryo-EM) structure 
of the spliceosome captured immediately after branching. This 
structure provides insight into recognition and positioning of the 5’SS 
and branch point at the active site, elucidates how proteins stabilize the 
architecture of the catalytic RNA core, and provides a molecular basis 


to understand the functions of RNA helicases and auxiliary factors in 
remodelling the spliceosome. 


Overview of the structure 

Spliceosomes from the yeast Saccharomyces cerevisiae were assembled 
on UBC4 pre-mRNA substrate!® with a mutation of the 3’-splice site 
(3’SS) sequence UAGAG to UACAC, and purified via an affinity-tag 
on Slu7 or Prp18 (Methods). The purified spliceosomes contained 
predominantly lariat intron—-3’-exon intermediates (Extended Data 
Fig. 1), indicating that the purified spliceosomes represent complex C. 
We obtained a cryo-EM reconstruction at 3.8 A overall resolution 
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Figure 1 | Subunit architecture of the spliceosomal complex C. 

a-c, Three orthogonal views of the complex coloured according to the 
subunit identity. d, A list of all 44 modelled subunits of the complex 
grouped into functional sub-complexes. 
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(Methods; Extended Data Figs 1-6; Extended Data Table 1) into which 
44 components have been modelled (Fig. 1; Extended Data Table 2; 
Supplementary Information). The U5 snRNP forms the core of the 
complex, which cradles the active site (Fig. la). Assembling onto this 
core, the NTC and NTR act as a multipronged clamp that stabilizes 
binding of the U2 snRNP core, the substrate, and auxiliary splicing 
factors to the U5 snRNP (Fig. la—c). The helicase module containing 
Brr2 and Prp16 protrudes from the U5 snRNP core (Fig. 1a, b). 

As in U4/U6.U5 tri-snRNP!”°, the Large domain of Prp8 (ref. 21) 
forms the foundation of the assembly together with the stable foot unit, 
comprising GTP-bound Snu114 and the N-terminal domain of Prp8, 
firmly gripping the U5 snRNA (Fig. 2a, b). Prp8 has undergone a large 
structural change including a 30° rotation of the foot with respect to the 
Large domain when compared to U4/U6.U5 tri-snRNP!? (Extended 
Data Fig. 7). U4 snRNA and its associated proteins have been released 
upon unwinding of the U4/U6 duplex by Brr2 (ref. 6). The 3’-domain 
of U2 snRNP comprising Msl1(U2B”), Leal(U2A’) and the Sm core 
domain bridges the Prp8 RNaseH-like domain and the N-terminal 
HAT (Half-a-TPR)-repeat domain of Syfl (Fig. 2a). Isy1 and Cef1 
dock with the N-terminal and reverse transcriptase (RT)-like domains 
of Prp8 (ref. 21), respectively, and anchor the N-terminal end of Cfll 
together with Prp45/Prp46 (Fig. 2c, d). These interactions support the 
HAT-repeat arches of Syfl and Cfll suspended over the Large domain 
of Prp8. The 5’ part of U2 snRNA and the 3’ part of U6 snRNA run 
side-by-side from the active site forming nine consecutive base-pairs 
extending towards the centre of the Syfl HAT-repeat arch (Fig. 2a-e). 
Bud31 anchors the 5’-stem of U6 snRNA to the N-terminal domain of 
Prp8 (Fig. 2c). Cwc2 is wedged between Bud31, Ecm2 and Prp45 and 
guides the path of U6 snRNA” (Fig. 2c). U2 snRNA downstream of the 
branch helix extends from the active site towards the 3’-domain of U2 
snRNP, forming two stems bridging the U2 Sm ring with Ecm2/Cwc2 
and the main body of the complex (Fig. 2d, e). Density for two RNA 
helices emanating from the U2 Sm ring is consistent with a stem-loop 
IIb/stem IIc arrangement and the catalytically competent conformation 
of the active site*>* (Fig. 2f). The C-terminal region of Cwc21 forms 
a coiled-coil that interacts with Snu114 (ref. 25) (Fig. 2a) while the 
N-terminal half of Cwc21 extends towards Prp8 and points into the 
U5 snRNA stem minor groove. 

Two large regions of weak density extend from the well-ordered 
core of the complex (Extended Data Fig. le). Focused classification 
allowed us to select subsets of particles (core + helicase, core+ Prp19) 
(Extended Data Fig. 2), in which less well-ordered components can be 
more clearly visualized. The weak density observed in the latter class 
is readily attributable to Prp19, Cef1 and Snt309 based on its distinct 
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Figure 2 | Overview of the core structure. 

a, Prp8 and its central role in organizing the 
entire assembly (SII denotes U2/U6 stem II). 

b, RNA only in the same orientation as ina 
(ISL, U6 snRNA internal stem-loop; 5’/SL, U6 
snRNA 5’ stem-loop; SL1, U5 snRNA stem- 
loop 1; VSL, U5 snRNA variable stem-loop; 
$3, U5 snRNA Stem III). c, Ecm2, Cwe2 and 
Bud31 binding to the 5’ end of the U6 snRNA. 
d, Top view of the complex. e, RNA only in the 
same orientation as in d. f, Secondary structure 
diagram for the 3’ end of U2 snRNA. Prp8¥, 
Prp8" and Prp8®" denote N-terminal, Large and 
RNaseH-like domains of Prp8. 
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Figure 3 | Structure of the RNA catalytic core. a, Key RNA elements at 
the active site. BP, branch point; ISL, internal stem-loop; M1 and M2, 
catalytic metal ion one and two. b, Orthogonal view illustrating the branch 
helix and helices Ia and Ib of U2/U6 snRNA duplex. c, The branch helix 
and 5/-exon with the 2’-5’ phosphodiester linkage (red arrow). d, Intricate 
RNA interactions at the active site (dotted lines indicate base triples; dot 
and star indicate G-U wobble and other non-canonical base-pairs). e, Base 
triple interaction between the branch helix and 5’-splice site. f, A network 
of interactions in the branch helix. g, Hoogsteen base-pair between intron 
A(+3) and G50 of U6 snRNA. 
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shape first observed in ILS”°, but the weaker density in complex C 
suggests these proteins are more loosely attached to the core than in 
ILS. A large lobe corresponding to a DEAH helicase in contact with 
Cwc25 is observed near the intron exit channel, downstream of the 
branch point. Although its limited resolution does not allow us to build 
a model de novo, the density is of sufficient quality to fit a DEAH box 
helicase model unambiguously (Extended Data Fig. 6; Extended Data 
Table 2) and it has been interpreted as Prp16 as it contacts Cwc25. An 
even larger domain is observed in contact with the DEAH helicase 
domain. The structure of Brr2 helicase coupled to the Jabl/MPN 
domain of Prp8 (ref. 27) can be docked into this density, consistent 
with an interaction between Prp16 and Brr2 (ref. 28). 


Active site 

The map shows that the phosphodiester bond at the 5/SS is cleaved 
and the 5’-phosphate of the first intron nucleotide G(+1) forms a 
2'-5' phosphodiester linkage with the branch point adenosine (A70), 
in agreement with the RNA analysis (Extended Data Figs 1b and 4b). 
The key RNA elements assemble around the active site harbouring 
the magnesium ion binding sites (Fig. 3). The 3’OH of the 5’-exon 
remains close to the 5’-phosphate of G(+1) such that the normal 5/-3’ 
phosphodiester linkage at the 5’SS could be restored with minimal 
structural alteration (Fig. 3c). The adenine base of branch point A70 
is bulged out from the branch helix and its N1 and 6-amino group 
are hydrogen-bonded to the 2’OH and O2 of U68 creating a unique 
backbone conformation which enables the 2’OH of A70 to project 
towards the 5’-phosphate of intron G(+1) (Fig. 3f). In yeast the intron 
sequence following the 5’SS is stringently conserved as GUAUGU”. The 
G(+1) base is partially packed against the A70 base while the U(+2) 
base is within hydrogen-bonding distance of U2 snRNA G37 suggesting 
a possible base-triple interaction with intron C67 (Fig. 3e). Mutation of 
G(+1) to C, or of the branch A70 to C, would disrupt these interactions, 
consistent with the strong branching defects observed for these 
mutations”’. Four conserved intron nucleotides A(+3)U(+4)G(+5) 
U(+6) form sequence-specific base-pairs with part of the ACAGAGA 
sequence of U6 snRNA”*3°3!, The three 5/-exon nucleotides A(—2) 
A(—3)A(—4) form Watson-Crick base-pairs with loop 1 of U5 
snRNA!'(Figs 3b, 4). Notably, the 5’-exon winds through a narrow 
channel between the Large and N-terminal domains of Prp8 formed 
during spliceosome activation (via 30° foot rotation) (Extended Data 
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Fig. 7c) and stabilized by Cwc21 and the C-terminal domain of Cwc22 
(Fig. 4a, b). Cwc22 consists of two HEAT repeat-containing domains 
that straddle the 5’-exon tunnel, providing insight into exon-junction 
complex deposition in higher eukaryotes*” (Extended Data Fig. 8). 

U6 snRNA following the ACAGAGA sequence forms helices Ia and 
Ib by base-pairing with U2 snRNA and folds back to form an intramo- 
lecular stem loop (ISL), in agreement with the structure inferred from 
genetics®> (Fig. 3b, d). Helices Ia and Ib show continuous base-stacking 
and the bulged U2 snRNA nucleotides U24 and A25 protrude from 
helix I and bind to the Prp8 RT domain (Figs 3d, 4d, e and 5a). The 
Watson-Crick faces of U6 snRNA nucleotides G52 and A53 interact 
with the Hoogsteen faces of G60 and A59, respectively, forming two 
consecutive base triples as inferred from genetics* (Fig. 4e, f). C66 
and A79 bulge out from the ISL (Fig. 3a, b), allowing continuous 
base-stacking of the bulged U80 with G52 and A53 and stabilizing the 
catalytic triplex. It has been proposed that pre-mRNA splicing reactions 
are catalysed by a two-metal-ion mechanism’. Indeed ligands for the 
two divalent metal ions have been identified by stereo-specific phos- 
phorothioate substitutions and metal rescue experiments” and density 
attributable to Mg’* ions is observed adjacent to these ligands (Extended 
Data Fig. 5). The 5’-exon 3’OH and the 5’ phosphate of G(+1) remain 
close to M1, while U6 snRNA metal ligands have repositioned slightly, in 
agreement with the previously observed repositioning of the branch in 
structures of a branched group II intron*”. Nonetheless, the branch helix 
remains ‘docked’ at the catalytic Mg’" site, in contrast to its ‘undocked’ 
configuration observed in the ILS structure, where it swings away from 
the ACAGAGA helix by 90° (ref. 26; Extended Data Fig. 5). 

The intron downstream of the 5‘SS GUAUGU sequence exits the 
active site near Cwc2, Ecm2, Clfl, Cefl and Isy1 (Fig. 2), re-enters 
the spliceosome and runs side-by-side with U2 snRNA in the opposite 
direction through a channel between the Prp8 Endonuclease and 
RNaseH-like domains (Extended Data Fig. 7). The intron then forms the 
branch helix with the GYAGUA sequence of U2 snRNA in proximity 
to the catalytic Mg’ site (Fig. 3b, d) and exits the active site through a 
channel made by the linker and RT-like domains of Prp8 (Fig. 2). 


Roles of proteins around the active site 

The RNA network at the active centre, comprising U2, U5 and U6 
snRNAs and RNA substrate, is stabilized by a number of proteins 
(Figs 1, 2 and 4). The catalytic RNA core is surrounded by the linker 


c Figure 4 | Proteins at the active site. 


a, 5’-exon channel formed between the Large 
and N-terminal domains of Prp8, Cwc21 

and Cwe22. b, 5’-exon-U5 loop 1 interaction 
surrounded by Prp8. Th/X denotes thumb/ 
domain X of Prp8 (residues 1,300-1,375). 

c, Interactions between the 5’-exon, the 
N-terminal (purple) and Large (blue) domains 
of Prp8, and Yju2 (green). Interactions involving 
protein main and side chains are shown by solid 
and dotted lines. d, Components surrounding 
U6 internal stem-loop. e, Prp8 and Cef1 (Myb1 
domain) stabilize the catalytic triplex. HB 
denotes helix bundle of the RT-like domain 
(residues 750-870). f, Structure of the catalytic 
triplex. 
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Figure 5 | Step-one factors and branch-site positioning. a, Interaction 
between the RNA catalytic core and Prp8. b, Positioning of the branch 
helix by step-one factors. c, Corresponding view in S. pombe post splicing 
ILS complex”®, showing marked repositioning of the branch helix and its 
further stabilization by debranching co-factor Cwf19. d, A close-up view 
of step-one factors interacting with the branch helix. 


and the helix bundle domains of Prp8 (refs 19,21) on one side and 
by NTC proteins (Prp45, Prp46, Isyl and Cef1) and step-one factors 
(Yju2 and Cwc25) on the other side, which together stabilize the 
catalytic RNA core for branching. Remarkable stacking of Prp8 Tyr671 
and Tyr1620 against bases at positions G(—5) and A(—6) stabilizes the 
5’-exon:U5 snRNA loop 1 pairing (Fig. 4b, c). The linker between the 
N-terminal and Large domains of Prp8 runs across the major groove 
of U6 ISL, which is positioned in a pocket formed by Prp8 and Clf1, 
and the interactions are sealed by the extended N terminus of Cwcl5 
(Fig. 4d). Cefl stabilizes the U2/U6 catalytic triplex** (Fig. 4e, f). 
Step-one-specific factors probe the branch helix and stabilize its 
docking at the catalytic core (Fig. 5). A long a-helix of Cwc25 contacts 
the RNaseH-like domain and a-finger of Prp8 and its N terminus is 
inserted into the widened major groove of the bulged branch helix 
(Fig. 5b, d). The N terminus of Yju2 wraps around the branch helix 
(Fig. 5d) and its Arg4 makes a base-specific contact with the intron 
U(+42) while its main chain amide group contacts the backbone 
phosphate of the 5’-exon A(—2) (Fig. 4c). Isyl projects its N terminus 
deep into the active site forming contacts with the phosphate back- 
bone of intron U68. Ser2 of Isy1 forms a hydrogen-bond with the O2 
carbonyl group of U(+2) of the intron. One of the Isy1 helices inserts 
into the minor groove of the ACAGAGA/5/SS helix. Cwc25 forms 
multiple contacts with the branch site, consistent with cross-linking 
experiments*® and its role in juxtaposition of the 5/SS and branch 
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point for branching*’“!. These spliceosomal factors are reminiscent 
of ribosomal proteins L27 and L16, which penetrate into the peptidyl 
transferase active site and stabilize tRNA binding”. 


Remodelling of the spliceosome 

The intron downstream of the branch point emerges from the exit 
channel formed by the Prp8 RT-like and linker domains and the a 
-finger, and projects towards Prp16 (Fig. 6a). Twelve nucleotides could 
span the distance between the last ordered intron nucleotide (branch 
point +6) and the substrate RNA entry site of Prp16, consistent with 
Prp16 crosslinking to 4-thiouridine introduced 18 nucleotides down- 
stream of the branch point’. Prp16 translocates 3/5’ towards the 
branch point along the intron upon ATP hydrolysis**’. Prp16 would 
thus pull the branch helix out of its pocket and hence destabilize the 
binding of Yju2 and Cwc25 (Fig. 6b). The undocked branch helix would 
allow the 3’-exon to enter the active site*5 and bind to U5 snRNA loop 
1 (refs 11,12). Consistent with this, destabilization of the branch helix 
by Isy1 deletion suppresses splicing defects caused by Prp16 
mutations*®. The step-two factors Prp18 and Slu7 are likely to dock into 
the space vacated by the branch helix/Yju2/Cwc25 to stabilize the 3’SS 
into the active site as Slu7 and Prp18 are in direct contact with the 3’SS 
bound to U5 snRNA loop 1 before exon ligation*” (Fig. 6b). Prp22 binds 
the 3’-exon at position +17 (ref. 15). Translocation of Prp22 on the 
3/-exon in the 3’5/ direction towards the active centre!>? would dis- 
place Prp18-Slu7, releasing the mRNA. In our structure, the density 
assigned to Prp16 is in direct contact with Cwc25 (Fig. 6a), consistent 
with Cwc25 stabilizing Prp16 binding to the spliceosome before 
branching“. We propose that the branch helix and 3’-exon confer 
specificity for auxiliary factors such as Cwc25-Yju2, Slu7-Prp18, which 
may act as adaptors that determine the identity of the next DEAH box 
helicase to remodel the active site. 

The structure of the Schizosaccharomyces pombe spliceosomal 
complex*®*$ contains a lariat intron but not 5/-exon or the spliced 
mRNA. The catalytic RNA core is surrounded by a similar set of 
NTC and NTR proteins but the structure lacks step-one or step-two 
factors”*"8, suggesting this corresponds to a post-splicing ILS”. Instead 
Cwf19, a homologue of the debranching enzyme co-factor Drn1 
(ref. 50), intrudes between the Large and RNaseH-like domains of Prp8, 
occupying the binding sites for Isy1, Cwe25, and Yju2 found in our 
complex C. Cwf19 marks the ILS complex for disassembly by displacing 
the branch helix, which rotates by 90° in ILS with respect to complex C 
(Fig. 5c, Extended Data Fig. 7). 

A pronounced conformational change between ILS and complex C 
is a large rotation of the NIC (Extended Data Fig. 7d). In ILS the N ter- 
minus of Syfl moves away from the core, promoting undocking of U2 
snRNP. In complex C, the position of U2 snRNP is stabilized by the for- 
mation of stem Ic and binding of Prp19. U2 snRNP is in direct contact 
with the RNaseH-like domain of Prp8, which holds Cwc25 in place. This 
network of interactions suggests that binding of Prp19 and formation of 
stem IIc in U2 snRNA may have an allosteric effect on the positioning 
of the branch helix via step-one factors. Extended arches of Syfl and 
Clfl may have a role in communicating the signal over long distance. 

Our spliceosomal complex C structure reveals the active configu- 
ration of the catalytic core, elucidating the arrangement of the RNA 


Intron-lariat spliceosome Figure 6 | The role of helicases in active site 
remodelling. a, The intron sequence downstream 
from the branch site exits the spliceosome via 

a channel in Prp8 and extends towards Prp16. 
Translocation of Prp16 towards the branch helix 
would destabilize step-one factors and displace the 
branch helix from its pocket. BP, branch point; OB, 
oligonucleotide/ oligosaccharide binding domain. 
b, Schematic illustrating how step-one- or 
step-two-specific factors can determine the 
specificity of the helicase recruited to the 
spliceosome at particular stages of splicing. 
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substrate and its interaction with proteins. The structure accounts for a 
large body of biochemical and genetic data and provides crucial insights 
into substrate docking and catalysis and the role of DEAH helicases and 
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auxiliary factors in spliceosome remodelling. 
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Received 9 June; accepted 19 July 2016. 

Published online 26 July 2016. 


Te 


2. 


25. 


26. 


27. 


28. 


29. 


30. 


Si 


. Nguyen, T. H. D. et a/. Cryo-EM structure of the yeas 


Will, C. L. & Luhrmann, R. Spliceosome structure and function. Cold Spring 
Harb. Perspect. Biol. 3, 3003707 (2011). 

Burge, C. B., Tuschl, T. & Sharp, P. A. in The RNA World II (eds Gesteland, R. F., 
Cech, T. R. & Atkins, J. F.) 525-560 (Cold Spring Harbor Laboratory Press, 
1999). 

Lambowitz, A. M. & Zimmerly, S. Group II introns: mobile ribozymes that 
invade DNA. Cold Spring Harb. Perspect. Biol. 3, a003616 (2011). 

Chan, S.-P., Kao, D.-I., Tsai, W.-Y. & Cheng, S.-C. The Prp19p-associated complex 
in spliceosome activation. Science 302, 279-282 (2003). 

Ohi, M. D. & Gould, K. L. Characterization of interactions among the Cef1p- 
Prp19p-associated splicing complex. RNA 8, 798-815 (2002). 

Fabrizio, P. et a/. The evolutionarily conserved core design of the catalytic 
activation step of the yeast spliceosome. Mol. Cell 36, 593-608 (2009). 
Lesser, C. F. & Guthrie, C. Mutations in U6 snRNA that alter splice site 
specificity: implications for the active site. Science 262, 1982-1988 (1993). 
Kandels-Lewis, S. & Séraphin, B. Involvement of U6 snRNA in 5’ splice site 
selection. Science 262, 2035-2039 (1993). 

Raghunathan, P. L. & Guthrie, C. RNA unwinding in U4/U6 snRNPs requires ATP 
hydrolysis and the DEIH-box splicing factor Brr2. Curr. Biol. 8, 847-855 (1998). 


. Laggerbauer, B., Achsel, T. & Luhrmann, R. The human U5-200kD DEXH-box 


protein unwinds U4/U6 RNA duplices in vitro. Proc. Nat! Acad. Sci. USA 95, 
4188-4192 (1998). 

ewman, A. J. & Norman, C. U5 snRNA interacts with exon sequences at 5’ 
and 3’ splice sites. Cell 68, 743-754 (1992). 


. Sontheimer, E. J. & Steitz, J. A. The U5 and U6 small nuclear RNAs as active 


site components of the spliceosome. Science 262, 1989-1996 (1993). 


. Ohrt, Tet al. Molecular dissection of step 2 catalysis of yeast pre-mRNA 


splicing investigated in a purified system. RNA 19, 902-915 (2013). 


. Cordin, O., Hahn, D. & Beggs, J. D. Structure, function and regulation of 


spliceosomal RNA helicases. Curr. Opin. Cell Biol. 24, 431-438 (2012). 


. Schwer, B. A conformational rearrangement in the spliceosome sets the stage 


for Prp22-dependent mRNA release. Mol. Cell 30, 743-754 (2008). 


. Company, M., Arenas, J. & Abelson, J. Requirement of the RNA helicase-like 


protein PRP22 for release of messenger RNA from spliceosomes. Nature 349, 
487-493 (1991). 


. Tsai, R.-T. et al. Spliceosome disassembly catalyzed by Prp43 and its 


associated components Ntr1 and Ntr2. Genes Dev. 19, 2991-3003 (2005). 


. Abelson, J. et al. Conformational dynamics of single pre-mRNA molecules 


during in vitro splicing. Nat. Struct. Mol. Biol. 17, 504-512 (2010). 
U4/U6.U5 tri-snRNP at 


3.7 A resolution. Nature 530, 298-302 (2016). 


. Wan, R. et al. The 3.8 A structure of the U4/U6.U5 tri-snRNP: Insights into 


spliceosome assembly and catalysis. Science 351, 466-475 (2016). 


. Galej, W. P., Oubridge, C., Newman, A. J. & Nagai, K. Crystal structure of Prp8 


reveals active site cavity of the spliceosome. Nature 493, 638-643 (2013). 


. Rasche, N. et al. Cwc2 and its human homologue RBM22 promote an active 


conformation of the spliceosome catalytic centre. EMBO J. 31, 1591-1604 (2012). 


. Hilliker, A. K., Mefford, M. A. & Staley, J. P. U2 toggles iteratively between the 


stem lla and stem Ilc conformations to promote pre-mRNA splicing. Genes 
Dev. 21, 821-834 (2007). 


. Perriman, R. J. & Ares, M., Jr. Rearrangement of competing U2 RNA helices 


within the spliceosome promotes multiple steps in splicing. Genes Dev. 21, 
811-820 (2007). 
Grainger, R. J., Barrass, J. D., Jacquier, A., Rain, J.-C. & Beggs, J. D. Physical and 
genetic interactions of yeast Cwc21p, an ortholog of human SRm300/SRRM2, 
suggest a role at the catalytic center of the spliceosome. RNA 15, 2161-2173 
(2009). 
Yan, C. et al. Structure of a yeast spliceosome at 3.6-angstrom resolution. 
Science 349, 1182-1191 (2015). 
Nguyen, T. H. D. et a/. Structural basis of Brr2-Prp8 interactions and 
implications for U5 snRNP biogenesis and the spliceosome active site. 
Structure 21, 910-919 (2013). 
van Nues, R. W. & Beggs, J. D. Functional contacts with a range of splicing proteins 
suggest a central role for Brr2p in the dynamic control of the order of events in 
spliceosomes of Saccharomyces cerevisiae. Genetics 157, 1451-1467 (2001). 
Smith, D. J., Query, C. C. & Konarska, M. M. “Nought may endure but 
mutability”: spliceosome dynamics and the regulation of splicing. Mol. Cell 30, 
657-666 (2008). 
Konarska, M. M., Vilardell, J. & Query, C. C. Repositioning of the reaction 
intermediate within the catalytic center of the spliceosome. Mol. Cell 21, 
543-553 (2006). 
Kim, C. H. & Abelson, J. Site-specific crosslinks of yeast U6 snRNA to the 
pre-mRNA near the 5’ splice site. RNA 2, 995-1010 (1996). 


(2013). 

33. Madhani, H. D. & Guthrie, C. A novel base-pairing interaction between U2 and 
U6 snRNAs suggests a mechanism for the catalytic activation of the 
spliceosome. Cell 71, 803-817 (1992). 

34. Fica, S. M., Mefford, M. A., Piccirilli, J. A. & Staley, J. P. Evidence for a group II 
intron-like catalytic triplex in the spliceosome. Nat. Struct. Mol. Biol. 21, 
464-471 (2014). 

35. Steitz, T. A. & Steitz, J. A. A general two-metal-ion mechanism for catalytic RNA. 
Proc. Natl Acad. Sci. USA 90, 6498-6502 (1993). 

36. Fica, S. M. et al. RNA catalyses nuclear pre-mRNA splicing. Nature 503, 
229-234 (2013). 

37. Robart, A. R., Chan, R. T., Peters, J. K., Rajashankar, K. R. & Toor, N. Crystal 
structure of a eukaryotic group II intron lariat. Nature 514, 193-197 (2014). 

38. Chen, H.-C., Tseng, C.-K., Tsai, R.-T., Chung, C.-S. & Cheng, S.-C. Link of 
NTR-mediated spliceosome disassembly with DEAH-box ATPases Prp2, Prp16, 
and Prp22. Mol. Cell. Biol. 33, 514-525 (2013). 

39. Warkocki, Z. et al. Reconstitution of both steps of Saccharomyces cerevisiae 
splicing with purified spliceosomal components. Nat. Struct. Mol. Biol. 16, 
1237-1243 (2009). 

40. Chiu, Y.-F. et al. Cwc25 is a novel splicing factor required after Prp2 and Yju2 to 
facilitate the first catalytic reaction. Mol. Cell. Biol. 29, 5671-5678 (2009). 

41. Krishnan, R. et al. Biased Brownian ratcheting leads to pre-mRNA remodeling 
and capture prior to first-step splicing. Nat. Struct. Mol. Biol. 20, 1450-1457 
(2013). 

42. Voorhees, R. M., Weixlbaumer, A., Loakes, D., Kelley, A. C. & Ramakrishnan, V. 
Insights into substrate stabilization from snapshots of the peptidy! transferase 
center of the intact 70S ribosome. Nat. Struct. Mol. Biol. 16, 528-533 (2009). 

43. Semlow, D. R., Blanco, M. R., Walter, N. G. & Staley, J. P. Spliceosomal 
DEAH-Box ATPases remodel pre-mRNA to activate alternative splice sites. 

Cell 164, 985-998 (2016). 

44. Tseng, C.-K., Liu, H.-L. & Cheng, S.-C. DEAH-box ATPase Prp16 has dual roles in 
remodeling of the spliceosome in catalytic steps. RNA 17, 145-154 (2011). 

45. Schwer, B. & Guthrie, C. A conformational rearrangement in the spliceosome is 
dependent on PRP16 and ATP hydrolysis. EMBO J. 11, 5033-5039 (1992). 

46. Villa, T. & Guthrie, C. The Isylp component of the NineTeen complex interacts 
with the ATPase Prp16p to regulate the fidelity of pre-mRNA splicing. 

Genes Dev. 19, 1894-1904 (2005). 

47. Umen, J. G. & Guthrie, C. Prp16p, Slu7p, and Prp8p interact with the 3’ splice 
site in two distinct stages during the second catalytic step of pre-mRNA 
splicing. RNA 1, 584-597 (1995). 

48. Chen, W. et a/. Endogenous U2-U5-U6 snRNA complexes in S. pombe are intron 
lariat spliceosomes. RNA 20, 308-320 (2014). 

49. Nguyen, T. H. D. et a/. CryoEM structures of two spliceosomal complexes: 
starter and dessert at the spliceosome feast. Curr. Opin. Struct. Biol. 36, 48-57 
(2016). 

50. Garrey, S. M. et al. A homolog of lariat-debranching enzyme modulates 
turnover of branched RNA. RNA 20, 1337-1348 (2014). 


Supplementary Information is available in the online version of the paper. 


Acknowledgements We thank K. Nguyen, X.-C. Bai and S. Scheres for their 
help and advice on data collection and processing; C. Savva, S. Chen, K. R. 
Vinothkumar, G. McMullan, J. Grimmett and T. Darling for smooth running of 
the EM and computing facilities; the mass spectrometry facility for help with 
protein identification, A. Brown, P. Emsley, G. Murshudov and J. Llacer for help 
and advice with model building and refinement; L. Langer and the members 
of the spliceosome group for help and advice throughout the project. We thank 
J. Lowe, V. Ramakrishnan, D. Barford and R. Henderson for their continuing 
support, S. Scheres, C. Plaschka and L. Strittmatter for critical reading of the 
manuscript and J. Vilardell for a generous gift of reagent. The project was 
supported by the Medical Research Council (MC_U105184330). M.E.W. was 
supported by a Rutherford Memorial Cambridge Scholarship, S.M.F. by EMBO 
and Marie Sktodowska-Curie fellowships. 


Author Contributions W.P.G., M.E.W. and S.M.F. established experimental 
procedures; W.P.G. and M.E.W. prepared the sample and grids, and processed 
EM data. W.P.G., M.E.W. and S.M.F. collected EM data. W.P.G., M.E.W. and C.O. 
carried out model building and refinement. W.P.G., M.E.W., S.M.F., C.O. and K.N. 
analysed the structure. AJ.N. contributed to the project through his knowledge 
and experience on yeast splicing. Manuscript was written by W.P.G., M.E.W. and 
K.N. and finalized with input from all authors. K.N. initiated and orchestrated the 
spliceosome project. 


Author Information The cryo-EM maps have been deposited in the Electron 
Microscopy Data Bank with accession codes EMD-4055, EMD-4056, 
EMD-4057, EMD-4058 and EMD-4059. The coordinates of the atomic models 
have been deposited in the Protein Data Bank under accession code 5LJ3 
(core of the complex) and 5LJ5 (overall structure). Reprints and permissions 
information is available at www.nature.com/reprints. The authors declare no 
competing financial interests. Readers are welcome to comment on the online 
version of the paper. Correspondence and requests for materials should be 
addressed to W.P.G. (wgalej@mrc-lmb.cam.ac.uk) or K.N. (kn@mrc-lmb.cam. 
ac.uk). 


8 SEPTEMBER 2016 | VOL 537 | NATURE | 201 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


ARTICLE 


METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized and the investigators were not blinded to allocation during 
experiments and outcome assessment. 

Prp18-HA and Slu7-TAPS tagging. SLU7-TAPS homology recombination 
cassettes were generated by PCR from pFA6a-TAPS-kanMX6, a modified version 
of pFA6a-TAP-kanMX6 in which the Calmodulin-binding peptide tag is replaced 
by two tandem copies of the StrepII tag*’. The PCR product was used to transform 
yeast strain YSCC1 (MATa pre1 prb1 pep4 leu2 trp1 ura3 PRP19-HA)' selecting 
for G418-resistance. PRP18-3xHA kanMX6 cassette was transformed into BY 4741 
strain (MATa his3A1 leu2A0 met15A0 ura3A0) and selected as above. Integration 
of the cassettes was confirmed by PCR and western blotting. 

Sample preparation. The Prp18-HA or Slu7-TAPS yeast strains were grown ina 
1201 fermenter, and splicing extract was prepared using liquid nitrogen method*® 
essentially as previously described*’. A DNA template for in vitro transcription 
was generated by addition of 2 x MS2 stem loops*? to the 5’-end of the UBC4 
pre-mRNA sequence’®, in which the 3’-splice site sequence UAGAG was mutated 
to UACAC. Pre-mRNA substrate was generated by run-off transcription from a 
plasmid DNA template and labelled at the 3’-end with fluorescein-5-thiosemi- 
carbazide™. In vitro splicing reactions were assembled using pre-mRNA substrate 
pre-bound to MS2-MBP fusion protein as previously described®*. The resulting 
spliceosomes were bound by amylose resin in HE-75 (20mM HEPES KOH pH 7.8, 
75mM KCl, 0.25mM EDTA, 5% glycerol, 0.01% NP-40) and eluted with 12 mM 
maltose. The sample was subsequently immobilised on either anti-HA-agarose 
(for Prp18-HA yeast extract) or Streptactin resin (for Slu7-TAPS yeast extract) in 
HE-100 (20mM HEPES KOH pH 7.8, 100 mM KCl, 0.25mM EDTA, 5% glycerol, 
0.01% NP-40) and eluted with either HA peptide (for anti- HA-agarose) or des- 
thiobiotin (for Streptactin resin), essentially as described°’. The eluate was finally 
dialysed against HE-75 buffer (without glycerol and NP-40) for EM sample prepa- 
ration. Analysis of fluorescently labelled RNA showed that pre-mRNA is converted 
to the lariat intron—-3’-exon intermediate in our sample and hence it is referred to 
as complex C (Extended Data Fig. 1b). Our experimental set-up was designed to 
purify step-two complexes after Prp16 action, however the presence of step-one 
factors in the structure and configuration of the active site clearly indicate that the 
complex has not undergone Prp16-mediated remodelling. It has been shown pre- 
viously’ that in low salt conditions Prp18, Slu7 and Prp16 associate with complex 
B* and C. Analysis of protein components by gel electrophoresis and subsequent 
mass spectrometry shows that Prp16 as well as Prp22 are present, in agreement 
with the previous results (Extended Data Fig. 1a; Extended Data Table 2613.43, 
Electron microscopy. For cryo-EM analysis, Quantifoil R2/2 Cu 400 mesh grids 
were coated with a 5-7 nm-thick layer of homemade carbon film and glow dis- 
charged. After applying 31] of the sample, the grids were blotted for 2.5-3s and 
vitrified in liquid ethane in FEI Vitrobot MKIII, at 100% humidity at 4°C. Grids 
were loaded into an FEI Titan Krios transmission electron microscope operated 
at 300kV and imaged using a Gatan K2 summit direct electron detector and a GIF 
Quantum energy filter (slit width 20 eV). Images were collected in super-resolution 
counting mode at 1.25 frames s~! and a calibrated pixel size of 1.43 A. A total dose 
of 40 eA~? over 16s and a defocus range of 0.5-4 1m were used. 

Image processing. A total of 2213 micrographs were subjected to whole-frame 
drift correction in MOTIONCORR®™ followed by contrast transfer function (CTF) 
parameter estimation in CTFFIND4 (ref. 57). All subsequent processing steps 
were done using RELION*® unless otherwise stated. An initial subset of 5,000 
particles was selected manually and subjected to reference-free 2D classification. 
Resulting 2D class averages were low-pass filtered to 20 A and used as templates 
for subsequent automated particle picking within RELION”. A total of 247,603 
particles were selected after initial reference-free 2D classification and subjected to 
3D classification (Extended Data Fig. 2). An initial 3D reference was prepared by 
scaling and low-pass filtering (60 A) the reconstruction of the intron-lariat com- 
plex (EMD-6413). A subset of 93,106 particles was selected after 3D classification. 
Particle-based beam-induced motion correction and radiation-damage weighting 
(particle polishing) followed by 3D refinement resulted in a final reconstruction 
at 3.8 A overall resolution and estimated accuracies of rotations of 1.1° (Extended 
Data Fig. 3). 

Very weak density observed at two peripheral regions of the map corresponds 
to Brr2/Prp16 (helicase module) and Prp19/Cef1/Snt309 (Prp19 module). We 
used focused classification with signal subtraction to improve the resolution of 
these regions. The region of interest was masked out and the projection of the 
remaining map was subtracted from the experimental particles using angular 
assignment from the last iteration of the 3D auto-refine run. Subtracted particles 
were 3D classified without image alignment and the best classes were selected 
for further refinement of the original (not subtracted) particles. This resulted in 
a smaller subset of the original particles, in which Brr2/Prp16 and Prp19/Cef1/ 


Snt309 are more homogeneous and consequently the density is improved in those 
regions (Extended Data Figs 2 and 3). 3D refinement of the selected 29,210 Prp19- 
selected particles resulted in a map at overall 5.1 A resolution, while 15,872 of 
the helicase-containing particles yielded a map at 10 A resolution. For the global 
classification approach we generated a soft mask around the core of the com- 
plex and classified polished particles with finer angular sampling of 1.8° and local 
searches of 10°. The resulting two major classes of 37K and 47K particles were 
refined to 4.1 A and 3.9 A respectively. They revealed a subtle conformational 
change of the U2 snRNP and Syf1 HAT arch correlated with the presence of WD40 
domain near the stem IIc and IIb region of U2 snRNA. This WD40 domain belongs 
to Prp17 or Prp19, but the local resolution did not allow us to make an unambig- 
uous assignment. All reported resolutions are based on the gold-standard Fourier 
shell correlation (FSC) = 0.143 criterion®’. FSC curves were calculated using soft 
spherical masks and high-resolution noise substitution was used to correct for 
convolution effects of the masks on the FSC curves™. Prior to visualization, all 
maps were corrected for the modulation transfer function of the detector. Local 
resolution was estimated using RESMAP®. 

Model building. A list of protein and RNA components included in the model is 
given in Extended Data Table 2. Building started by docking known structures of 
S. cerevisiae Prp8, Snul14, U5 Sm ring, US snRNA’, Cwe2 (ref. 64) and Bud31 
(ref. 65) into the map. Homology models for Cef1, Prp45, Prp46, Ecm2 and Cwcl5 
were built with SWISS-MODEL™, using structures from the S. pombe intron-lariat 
spliceosome”® as templates, and were docked into the map. This accounted for the 
majority of the protein density in the core, allowing building of the intron, U6 
snRNA and U2 snRNA. RNA extending from the loop 1 of U5 snRNA was assigned 
to nucleotides —1 to —16 of the 5/-exon as previously predicted'!. A model for 
the NTD of Cwc22 was built using SWISS-MODEL based on the structure of 
the human Cwc22:eIF4AIII complex” and docked near Snu114. Clear density 
near the NTD of Cwc22 was interpreted as the MA3 domain at the C terminus of 
Cwc22; this domain was built de novo. A coiled-coil was found contacting domain 
IV of Snu114. Based on an unpublished NMR structure from Arabidopsis thaliana 
(PDB ID: 2E62) and biochemical data?’ we assigned this density to the CTD of 
Cwc21. Weak density was observed connecting this coiled-coil to a peptide con- 
tacting the 5’-exon. We therefore assigned this peptide as the N terminus of Cwc21. 
Unassigned density remained near the branch-point helix. Based on secondary 
structure prediction” we assigned a portion of this density to Yju2 and were able 
to build its NTD de novo; our assignment was supported by clear density for a zinc 
atom coordinated by four conserved cysteines. The remainder of the density could 
then be assigned to the N termini of Cwc25 and Isy1. 

The majority of the model building described above was for the core of the 
spliceosome where the resolution was uniformly between 3.5-4.5 A (Extended Data 
Fig. 4). For the periphery of the complex, the resolution was more heterogeneous, 
ranging from 4 to 20 A. Clear features of the periphery were two large proteins 
with extended architectures. One of these proteins started in the core and pro- 
jected outwards to the periphery. At the core, side-chains were easily visible for this 
protein and allowed assignment as the N terminus of Clf1. Towards the C termi- 
nus of Clfl the resolution only allowed building of idealised poly-alanine helices, 
which were then assigned sequence based on secondary structure predictions”. 
For the other extended protein, few side-chains were visible but helices could be 
distinguished. This protein was generally built as poly-alanine helices, and based 
on secondary structure predictions” was assigned as Syfl. A second Sm ring at 
medium-resolution was found in the map and was assigned as the U2 snRNA Sm 
ring. Homology models for the U2 snRNP proteins Leal and Msl1 were generated 
using SWISS-MODEL®* based on the structure of the human U2B/”-U2A’-U2 
snRNA complex® and were docked into the adjacent density. The portion of the 
U2 snRNA in contact with Msl1 was most consistent with the previously proposed 
stem IV + stem V architecture and was built based on the secondary structure 
prediction®. Two RNA double helices were observed bridging the U2 Sm ring to 
Ecm2 and were assigned as stems IIb and IIc of the U2 snRNA. Using 3D classifi- 
cation, we found that some of the particles contained a large lobe of extra density 
connected to the RT-like and RNaseH-like domains of Prp8 (see above). Although 
we could not resolve secondary structure in this region, we could perfectly dock 
the crystal structure of Brr2 and the Jabl1/MPN domain of Prp8 (ref. 27). The 
remainder of the density could then well accommodate an I-TASSER” homology 
model of Prp16 based on the crystal structure of Prp43 (ref. 71). Weak density 
connected to Clf1 and Syfl1 had the characteristic shape of Prp19-Snt309-Cefl 
(ref. 26). Focused classification in this region could improve the density enough 
to resolve the U-box dimers and thus dock a homology model of these proteins. 
Finally, three copies of the Prp19 WD40 domain crystal structure could be docked 
into very weak density adjacent to the Prp19 coiled-coils. With the exception of the 
helicase and Prp19 modules all models were manually rebuilt in order to obtain the 
best fit to the cryo-EM density. The model was refined using REFMAC 5.8 (ref. 72) 
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with secondary structure restraints generated in PROSMART” and RNA base-pair 
and stacking restraints generated in LIBG”*. Extended Data Table 1 summarizes 
refinement statistics and PBD and EMDB accession codes. 

Map visualization. Maps were visualized in Chimera”? and figures were prepared 
using PyMOL (http://www.pymol.org). 
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Extended Data Figure 1 | Biochemical characterization of the complex 
and initial cryo-EM analysis. a, SDS-PAGE analysis of the purified 
sample. Protein identities were confirmed by mass spectrometry analysis. 
Protein labels are coloured according to sub-complex identity (dark blue, 
U5 snRNP; light blue, helicase module; orange, NTC; yellow, NTR; green, 
U2 snRNP; purple, splicing factors; grey, not found in density). b, Analysis 
of the fluorescently labelled substrate in the sample by denaturing PAGE, 
showing conversion of linear pre-mRNA (time point 0’) into branched 


d 
() 


589 A 


lariat-intron intermediate (time point 30’), which is a predominant species 
in the purified sample (C complex). The two hairpins on the right depict 
the 2 x MS2 stem-loops attached to the 5’ end of the UBC4 pre-mRNA 
substrate for affinity purification. c, A typical cryo-EM micrograph 
collected on an FEI Titan Krios microscope operated at 300 kV and 
detected with a Gatan K2 Summit camera. d, Reference-free 2D 
classification results. e, Detail of a single class average with major domains 
labelled. 
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Extended Data Figure 2 | Overview of the data processing scheme used 
in this study. Iterative 2D classification, template selection and automated 
particle picking resulted in 248K particles which were classified in 3D 
with a scaled and low-pass-filtered model of ILS (EMDB-6413) asa 
reference. The best class was refined to 3.8 A resolution overall. Focused 
classification allowed us to obtain two other maps with improved quality 


29K particles 16K particles 
EMD-4056 EMD-4057 
5.1A 10.0A 


of the peripheral regions (Prp19 and helicase modules, EMD-4056 and 
EMD-4057). Classification of the core complex with fine angular sampling 
and local searches revealed a subtle movement of the U2 snRNP which 
correlates with the appearance of the extra density, interpreted as a WD40 
domain which belongs to Prp17 or Prp19. 
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Extended Data Figure 3 | Global and local resolution analysis. a, Two orthogonal sections through the map showing variation in the local resolution 
as estimated by Resmap. b, An overall map of the core complex c, Gold-standard FSC plots for three maps used in this study. d, Map of the core complex 
with a helicase module. e, A map of the core complex with Prp19 module. 
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Extended Data Figure 4 | Examples of cryo-EM density at the core 

of the complex with atomic models built in. a, U5 snRNA loop 1 with 
5’-exon bound. b, The active site with exon, intron, U2 and U6 snRNAs. 

c, Two helices of the Prp8 reverse transcriptase thumb/X domain, showing 
a clear helical pitch and excellent densities for the side chains. d, Fourier 
Shell Correlation between model and the map and cross-validation of the 
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model fitting. (The original atom positions have been randomly displaced 
up to 0.5 A and refined with restraints against the halfl map only. FSC was 
calculated for two half maps. Excellent correlation up to high resolution 
between the model and the half2 map (which was not used in refinement) 
cross-validates the model for overfitting. 
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Extended Data Figure 5 | Metal binding by the catalytic core of C 
complex. a, b, Structure (a) and schematic representation (b) of the active 
site of a group IIC intron trapped in the pre-catalytic state in the presence 
of Ca?+ (PDB 4FAQ, ref. 76). The 5’ splice site scissile phosphate is aligned 
with the two metals bound at the core in a catalytic configuration, as 
shown in b. Note that, in this pre-catalytic structure, the group II domain 
VI is not present and therefore the structure does not contain the bulged 
adenosine nucleophile required for the branching reaction. As a result, the 
nucleophile is a water molecule, rather than the 2’-OH of the branch site 
adenosine found in spliceosomal introns. c, d, e, Structure of the RNA at 
the active site of spliceosomal C complex, showing the overall architecture 
(c), schematic of metal binding (d), and comparison of the model with 

the EM density (e). Note conservation of the metal binding residues 
compared to the group II intron (compare with ref. 36) and proximity of 
the cleaved G(—1)-G(+1) bond to putative M1. f, Proposed interactions 


7| 


site 


a: : ops 
wl SO2 


3° exon 


between U6 snRNA and the two catalytic Mg** during the transition state 
for branching, as inferred from biochemistry*®. g, h, Structure (g) and 
schematic (h) of the RNA core of the U2.U6.U5 ILS complex in a post- 
catalytic configuration (PDB 3JB9, ref. 26), probably following release 

of the mRNA. The two Mg?" are shown as modelled in the coordinates 
deposited by the authors of the ILS structure (PDB 3JB9, ref. 26). In the 
ILS structure M1 and M2 are further apart (7.2 A) than in most other 
structures of RNAs that coordinate catalytic metals (usually 3.9-5 A); 
nonetheless, the ligands modelled for M1 and M2 are consistent with the 
ligands identified biochemically for the two catalytic Mg”* necessary for 
splicing (compare PDB 3JB9 and 4ROD with the data in refs 34,36). Note 
that the branch helix is undocked from the U6 snRNA metal binding site 
and G(+1) is far away from the two Mg?" at the core. The substrate and 
snRNAs are colour-coded while residues that position the catalytic metals 
are shown in magenta. 
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Extended Data Figure 6 | Examples of the structures of isolated components. De novo-built proteins are shown in cartoon form, along with a 
secondary structure diagram for the novel zinc-finger fold of Yju2. Proteins that were modelled into low-resolution regions by rigid-body docking 
of crystal structures or homology models (Prp19 module, Brr2, Prp16, Prp8!*”!“") are shown in their cryo-EM densities. 
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Extended Data Figure 7 | Conformational changes between U4/U6.U5 
tri-snRNP, complex C and intron-lariat spliceosome. a, Rearrangement 
of the RNaseH-like domain with respect to the main body of Prp8 in all 
three complexes. b, «-Finger (1,575-1,598) contacting the key RNA and 
proteins in a context-dependent manner. c, Prp8 N-terminal domain 


Prps'T? Prpgst? 


U2 snRNP 


S. pombe ILS complex 


movements along with Prp8 residues 1,406-1,436 transiently docking on 
top of the 5’-exon and Cwc21 in complex C, stabilizing the 5’-exon and 
interdomain contacts in Prp8. d, Conformational rearrangements between 
complex C and S. pombe ILS** showing a coupled movement of the U2 
snRNP, Syfl and Prp19. 
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Extended Data Figure 8 | Implications for deposition of the exon- 
junction complex. In higher eukaryotes exon-junction complexes 
(EJCs) are deposited 20-24 nucleotides (nt) upstream of splice junctions, 
and form a binding platform for factors involved in nuclear export, 
translation, alternative splicing and nonsense-mediated mRNA decay”’. 
The core EJC components eIF4AHI, MAGOH and Y14 are found in 
human B and C complexes’*. Cwe22 is required for eIF4AIII recruitment 
to spliceosomes’”**! and holds it in an open, inactive conformation”. 
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a, Crystal structure of the eI[F4AIII-Cwc22 complex docked onto the 
spliceosomal C complex via superposition on Cwc22. b, Crystal structure 
of the core EJC***° superimposed on the previous model via the second 
RecA domain of eIF4AIIL c, The 5/-exon exiting the channel at the 
interface between the Prp8 Large and N-terminal domains is positioned 
perfectly for the deposition of the EJC, explaining how the Cwc22 MIF4G 
domain is involved in determining the distance of EJC deposition from the 
splice junction. 
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Extended Data Table 1 | Cryo-EM data collection and refinement statistics 


Data collection 
Microscope 
Voltage (kV) 
Electron dose (e A”) 
Detector 
Pixel size (A) 
Defocus Range (um) 
Reconstruction (Relion) 
Particles 
Box size (pix) 
Accuracy of rotations (°) 
Accuracy of translations (pix) 
Map sharpening B-factor (A?) 
Final resolution (A) 
Model composition 
Protein residues 
RNA bases 
Ligands 
Refinement (Refmac) 
Resolution (A) 
FS Caverage 
R factor 
R.m.s deviations 
Bond lengths (A) 
Bong angles (°) 
Validation" 
Molprobity score 
Clashscore, all atoms 
Good rotamers (%) 
Ramachandran plot 
Favoured (%) 
Outliers (%) 
RNA validation’ 
Correct sugar puckers (%) 
Good backbone conformations (%) 
Deposition 
PDB ID 
EMDB ID 


Core 


FEI Titan Krios 
300 

40 

Gatan K2 Summit 
1.43 

0.5-4.0 


93 106 
412 
1.13 
0.64 
-57 
3.75 


7447 
458 
10 


3.8 
0.82 
0.32 


0.007 
1.25 


2.5 (98"" percentile) 
5.3 (100" percentile) 


80 


5SLJ3 
EMDB-4055 


Core+Prp1 9 


FEI Titan Krios 


Corethelicase 


FEI Titan Krios 


300 300 
40 40 
Gatan K2 Summit Gatan K2 Summit 
1.43 1.43 
0.5-4.0 0.5-4.0 
29 210 15 872 
412 412 
1.13 1.51 
0.96 1.30 
AT -350 
5.08 9.78 
EE at 
11978" 
458 
10 
5LJ5* 5LJ5* 
EMDB-4056 EMDB-4057 


*Represents a sub-set of the whole data set (Core). 
+Determined by Molprobity®*. 


£Overall model including Prp19 and helicase modules. 
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Extended Data Table 2 | Summary of model building for spliceosomal complex C 


Proteins and RNA included in the model 


M.W. Modelling Human/ S. 
Sub-complexes | Protein/RNA Domains Total residues Modelled template Modelling Resolution? Chain ID 
(Da) pombe names 
(PDB ID) 
Prp8 N-terminal 1-870 101,767 128-870 SGAN Docked & rebuilt 3.4-5.8 A 220K/Spp42 
Large 871-1827 111,525 871-1827 5GAN Docked & rebuilt 3.6 - 6.2 
RNaseH 1828-2085 29,453 1837-2085 5GAN Docked & rebuilt 4.2-6.6 
Jab1/MPN 2086-2413 36,812 2148-2396 4BGD Rigid docking ~15- 20 
Snui14 1008 114,041 67-998 5GAN Docked & rebuilt 3.8-7.2 c 116K/Cwf10 
SmB 196 22,403 4-102 5GAN Docked 4.6-7.2 b SmB/SmB 
U5 snRNP SmD3 110 11,229 4-85 5GAN Docked 44-7.8 d SmD3/SmD3 
SmD1 146 16,288 1-109 5GAN Docked 4.8-7.8 h SmD1/SmD1 
SmD2 110 12,856 15-108 S5GAN Docked 5.2 - 8.0 j SmD2/SmD2 
SmF 94 10,373 12-83 5GAN Docked 5.2 - 8.0 f SmF/SmF 
SmE 96 9,659 10-92 5GAN Docked 5.4-8.0 e SmE/SmE 
SmG 77 8,479 2-76 SGAN Docked 5.0-7.8 g SmG/SmG 
US snRNA-L 214 68,847 4-144 De novo 3.8-7.6 U 
MsI1 111 12,830 28-111 1A9N Homology modelled 6.6 - 8.8 Y U2-B" 
Leal 238 27,193 1-167 1A9N Homology modelled 5.6 - 8.6 WwW U2-A' 
SmB 196 22,403 4-102 5GAN Docked 5.4-8.2 k SmB/SmB 
SmD3 110 11,229 4-85 5GAN Docked 6.0 - 8.2 n SmD3/SmD3 
SmD1 146 16,288 1-118 5GAN Docked 5.0 - 8.0 | SmD1/SmD1 
ZS OBNy smD2 110 12,856 15-108  SGAN Docked 5.0-7.6  m smD2/SmD2 
SmF 94 10,373 12-83 5GAN Docked 5.2-7.4 q SmF/SmF 
SmE 96 9,659 10-92 5GAN Docked 5.4-8.0 p SmE/SmE 
SmG 77 8,479 2-76 5GAN Docked 5.8 -8.2 r SmG/SmG 
U2 snRNA alil7 5) 363,824 Saou, De novo 3.8 - 6.0 7. 
1089-1169 
U6 U6 snRNA 112 36,088 1-102 De novo 3.6 -6.4 Vv 
Prp19 U-box 1-51 5,713 1-51 3JB9 Homology modelled ~20 t,u,V,W PRPF19/Cwf8 
Coiled-coil 52-143 10,247 78-143 3JB9 Homology modelled ~20 
WD40 144-503 40,646 171-501 3LRV Docked ~25-30 
Snt309 175 20,709 12-174 3JB9 Homology modelled ~20 s BCAS2/Cwf7 
Syf1 859 100,229 21-790 Idealised alpha helices 48-8 Ip SYF1/Cwf3 
NTC Clif Core 1-271 32,396 1-271 3JB9 Homology modelled & rebuilt 3.8-6.4 S CRNKL1/Cwf4 
Periphery 272-687 50,067 277-556 Idealised alpha helices 5.2 -8.8 
Cef1 N-terminal 1-191 21,868 12-191 3JB9 Homology modelled & rebuilt 3.8 - 6.2 fe) CDCS5L/Cdc5 
Middle 192-505 65,905 - Not modelled - 
C-terminal 506-590 9,994 506-590 3JB9 Homology modelled ~20 
Isy1 235 32,992 1-96 De novo 3.8 - 6.2 G ISY1/Cwf12 
Prp45 379 42,483 32-224 3JB9 Homology modelled & rebuilt 4-84 K SNW1/Prp45 
Prp46 451 50,700 111-445 3JB9 Homology modelled & rebuilt 3.4-6.6 J PLRG1/Prp5 
nim Ecm2 364 40,925 6-324 3JB9 Homology modelled & rebuilt 4.0 - 7.0 N RBM22/Cwf5 
Cwc2 339 38,431 3-252 3U1L Docked & rebuilt 3.6 - 6.0 M RBM22/Cwf2 
Cwc15 175 19,935 7-40 3JB9 Homology modelled & rebuilt 3.6 - 7.6 P CWC15/Cwf15 
Bud31 157 18,447 2-156 2MY1 Docked & rebuilt 3.6-6.8 L BUD31/Cwf14 
Yju2 278 32,312 2-115 De novo 3.8-5.4 D CCDC94/Cwf16 
Cwc21 N-terminal 1-64 7,057 2-50 De novo 3.8-7.4 R SRRM2/Cwf21 
Splicing factors Coiled-coil 65-135 8,724 64-111 2E62 Homology modelled 4.4-7.6 
Cwc22 MIF4G 1-288 33,187 11-262 4C9B Homology modelled & adjusted 4.6 - 8.2 H CWC22/Cwf22 
MA3 289-577 34,125 289-481 De novo 3.8 -7.0 
Cwc25 179 20,374 3-48 De novo 3.8-7.0 F CWC25/Cwf25 
Brr2 2,163 246,185 442-2163 4BGD Docked ~13 - 20 B 200K/Brr2 
Helicases Homology modelled & 
Prp16 1,071 121,653 338-978 2XAU domains fitted ~12-15 Q DHX38/Prp16 
5'-exon 20 6,683  (-16) - (-1) De novo 3.4-6.4 [= 
Substrate 
Intron 95 30,405 1-10; 54-76 De novo 3.4-7.2 I 
Resolution was calculated by averaging ResMap-calculated resolution voxels over each residue using Chimera. The resolution of residues at the 5th and 95th percentile for each chain then gave the 


resolution range for that chain. Da, Dalton. 
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Structural basis for the antifolding 
activity of a molecular chaperone 


Chengdong Huang, Paolo Rossi!, Tomohide Saio! & Charalampos G. Kalodimos! 


Molecular chaperones act on non-native proteins in the cell to prevent their aggregation, premature folding or misfolding. 
Different chaperones often exert distinct effects, such as acceleration or delay of folding, on client proteins via mechanisms 
that are poorly understood. Here we report the solution structure of SecB, a chaperone that exhibits strong antifolding 
activity, in complex with alkaline phosphatase and maltose-binding protein captured in their unfolded states. SecB uses 
long hydrophobic grooves that run around its disk-like shape to recognize and bind to multiple hydrophobic segments 
across the length of non-native proteins. The multivalent binding mode results in proteins wrapping around SecB. This 
unique complex architecture alters the kinetics of protein binding to SecB and confers strong antifolding activity on the 
chaperone. The data show how the different architectures of chaperones result in distinct binding modes with non-native 


proteins that ultimately define the activity of the chaperone. 


Molecular chaperones rescue non-native proteins in the cell from 
aggregation and assist with their folding or unfolding to maintain a 
functional proteome!*, Despite common features, different families 
of chaperones exhibit distinct activity and biological function?. 
Chaperones may exhibit foldase activity, whereby they accelerate 
folding of client proteins, or antifolding (holdase) activity, whereby 
they delay folding of client proteins, and the strength of the activity can 
vary significantly’”. Molecular chaperones come in different sizes anda 
great variety of molecular shapes'*. However, the scarcity of structural 
data of chaperones in complex with non-native proteins has impeded 
an understanding of how different chaperones engage these proteins 
and how distinct chaperone architectures may alter activity. 

SecB is a multitasking molecular chaperone in the cytosol that 
exhibits an unusually strong antifolding activity®. SecB is responsible 
for maintaining secretory proteins in an unfolded, secretion-competent 
state’~'°, as well as for their targeted delivery to the SecA ATPase”! 
SecB also acts as a generalized chaperone in the cell!?-"!” and a secB null 
mutation results in severe protein aggregation!>'*, Although extensively 
studied with biochemical and biophysical techniques®!%, the struc- 
tural and mechanistic details of how SecB recognizes non-native pro- 
teins and how it exerts its antifolding activity are unknown. Recent 
advances in nuclear magnetic resonance (NMR) and isotope labelling 
approaches have enabled the characterization of large, dynamic 
protein complexes including molecular chaperones””-**. We have 
exploited these approaches to determine the solution structure of 
SecB in complex with client proteins captured in their unfolded 
state, revealing a unique binding architecture among protein-protein 
complexes. 


Recognition sites in SecB and client proteins 

SecB exists as a tetramer organized as a dimer of dimers (dissociation 
constant, Ky, of tetramer-dimer equilibrium is ~20 nM (ref. 24)) 
with an overall rectangular disk-like shape*>”® (Fig. 1a). Each subunit 
consists of 155 residues (17.5 kDa) composed of a simple a/B fold. 
The 'H-!°N- and 'H-!8C-correlated NMR spectra of the 70kDa 
Escherichia coli SecB labelled in methyl-bearing (Ala, Ile, Met, Leu, 
Thr and Val) and aromatic (Phe, Trp and Tyr) residues are of high 
quality and near-complete assignment has been obtained (Methods 
and Extended Data Fig. 1a, b). We used maltose-binding protein (MBP) 


(396 amino acids) and alkaline phosphatase (PhoA) (471 amino acids) 
as SecB protein substrates. NMR analysis (Extended Data Figs 1c, d and 
2a) showed that there are five distinct SecB-recognition sites in PhoA 
(labelled a-e; Fig. 1b) and seven sites in MBP (labelled a-—g; Fig. 1c), 
with all sites being enriched in hydrophobic and aromatic residues, as 
shown before”’. 

To determine the client-binding sites in SecB we sought to identify 
the SecB residues that show intermolecular nuclear Overhauser effects 
(NOEs) to short fragments of PhoA and MBP encompassing SecB- 
recognition sites. The SecB residues that interact with the substrates 
(Fig. 1d, e) collectively form long, continuous hydrophobic grooves that 
constitute the primary binding sites for non-native proteins (Fig. 1f). 
Most prominent is a shallow groove running along the surface, formed 
by helices al and «2, the helix-connecting loop, the crossover loop 
and strand 82 (Fig. 1a, d-f). This groove, referred to as the primary 
client-binding site (Fig. 1f), is ~60 A long and exposes ~1,300 A? of 
hydrophobic surface, per SecB subunit. In addition, a smaller surface 
(~600 A’) formed by residues emanating from helix a1 and strands 
81 and £4 also interacts with the unfolded proteins (Fig. le). This 
small surface, the secondary client-binding site (Fig. 1f), features 
several bulky non-polar amino acids. All four subunits combined, SecB 
exposes ~7,600 A? of hydrophobic surface that NMR has shown to 
interact with non-native proteins (Fig. 1f). 


SecB holds proteins in the unfolded state 

We used NMR spectroscopy to monitor at the residue level the effect 
of SecB on the folding of PhoA and MBP. Urea-treated PhoA and MBP 
refold quickly to their native state upon removal of urea (Extended Data 
Fig. 2b, c). Notably, SecB prevents the folding of PhoA or MBP, with 
both proteins adopting an unfolded conformation when bound to SecB 
(Extended Data Fig. 2b, c). The NMR data indicate that SecB-bound 
PhoA and MBP lack a tertiary structure and the regions of the protein 
substrates in contact with SecB do not form any secondary structure. 


Client proteins wrap around SecB 

To understand how SecB retains bound proteins in the unfolded state, 
we sought to structurally characterize the complexes of SecB with PhoA 
and MBP under native conditions. Multi-angle light scattering (MALS), 
isothermal titration calorimetry (ITC) and NMR all demonstrate that 
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Figure 1 | Recognition sites in SecB and client proteins. a, Structure 
of E. coli SecB (Protein Data Bank accession number 1QYN). The four 
subunits (A-D) are coloured differently. The structural elements are 
labelled on subunit A. b, c, Hydrophobicity plot of PhoA (b) and MBP 
(c), as a function of their primary sequence. A hydrophobicity score 
(Roseman algorithm, window = 9) higher than zero denotes increased 
hydrophobicity. The sites identified by NMR to be recognized by SecB in 
PhoA (labelled a-e) and MBP (labelled a-g) are highlighted in blue and 


SecB forms stoichiometric complexes with PhoA and MBP (Extended 
Data Fig. 3), as is the case with other large client proteins including 
OmpA!””7. The structure of the SecB—PhoA complex (~120 kDa) 
was determined by NMR as detailed in Methods (Extended Data 
Figs 4 and 5 and Extended Data Table 1) and is shown in Fig. 2. The 
most remarkable feature is that PhoA wraps around SecB in an overall 
arrangement that maximizes the interacting surface between the client 
protein, which is held in an unfolded conformation, and the chaperone. 
All of the grooves, the primary client-binding sites in SecB, in the four 
subunits are occupied by specific PhoA sites (a, c, d and e) while the 
short PhoA site b binds to the smaller, secondary binding site (Fig. 2). 
The simultaneous engagement of all PhoA sites by SecB results in a 
significant enhancement in the affinity of the unfolded protein for SecB 
(Extended Data Fig. 3b, c), although the binding synergy is not strong. 
This is probably because the linkers tethering the SecB-recognition 
sites in PhoA are long and flexible (Fig. 1b and Extended Data 
Fig. 4a), thereby reducing the effective concentration of the sites and 
the measured avidity”®. 

Analysis of the SecB—PhoA structure revealed how SecB recognizes 
PhoA and how it accommodates all five PhoA sites (Fig. 1b) within one 
SecB molecule (Fig. 3). Most of the PhoA site a residues (Thr5-Ala21) 
are engaged in non-polar contacts with the SecB residues in the groove, 
burying a total of ~2,250 A? of surface (~1,900 A? non-polar and 
~350 A? polar). Interestingly, helix 02 in SecB, which acts as a lid of 
the binding groove, swings outwards by ~50° upon PhoA binding 
(Fig. 4a). Together with an outward displacement of the first two 
turns of the helix a1, the width of the hydrophobic groove increases 
significantly to accommodate the large non-polar side chains of the 
client (Figs 3 and 4a). Moreover, the rearrangement of several side 
chains lining the SecB groove allows some of the bulky PhoA residues 
(for example, Leu8, Leul1 and Phel5) to bury their side chains into 
the groove. Although most of the contacts are hydrophobic, several 
of the polar groups in PhoA site a are poised to form hydrogen bonds 
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f g 


-- MBP 


1218-1311 1355-A377 

the residue range is shown at the bottom. d, The SecB residues identified 
by intermolecular NOE data to interact with PhoA and MBP are shown 

in ball-and-stick and coloured blue. e, Expanded view of the binding 

sites in SecB subunit A is shown and the residues interacting with client 
proteins are labelled. f, The hydrophobic residues in SecB are coloured 
green, whereas all other residues are coloured white. The primary (P) and 
secondary (S) client-binding sites in SecB are marked and their boundaries 
delineated. 


with polar SecB residues lining the groove (Fig. 3). PhoA site a binds 
to SecB in an extended conformation, which maximizes the interacting 
surface. Of note, this region of PhoA forms an a-helix when bound to 
a hydrophobic groove in the SecA ATPase”. Thus, SecB disfavours 
the formation of any regular secondary structure of the bound client. 

PhoA site c is the longest SecB-recognition site in PhoA consisting of 
~50 residues (Fig. 1b). It binds to SecB in an extended conformation 
spanning a distance of ~100 A (Figs 2 and 3). The first 33 residues 
(Phe93-Ala125) of PhoA site c bind exclusively within the groove of 
one subunit, whereas the remaining PhoA‘ (the superscript denotes the 
corresponding site) residues (Ala126-Tyr138) extend across the surface 
at the tetramerization interface. The total surface buried by the binding 
of PhoA site c to SecB is ~5,150 A? (~3,500 A? non-polar and ~1,600 A? 
polar). PhoA site d encompasses a stretch of 30 residues (Ala271- 
Thr309) and binds to SecB in an extended conformation, running 
along the entire groove and spanning a distance of ~70 A (Figs 2 
and 3). The buried surface amounts to a total of ~4,200 A, with 
~2,800 A? non-polar and ~1,400 A? polar. PhoA site e (residues 
Asn450-Lys471) binds to SecB in a very similar manner to PhoA site 
a. PhoA site e is one of the regions that retains significant a-helical 
structure in the unfolded PhoA”®. It binds to SecB, however, in an 
extended conformation, further highlighting the tendency of SecB to 
disrupt any regular secondary structure. 

SecB can adjust the structure of the primary binding grooves to 
allow longer substrates to fit in the groove. For example, whereas 
~25 residues of the PhoA site d fit in the groove in an extended 
conformation, more than 40 residues of the PhoA site c fit within the 
same space (Fig. 4b). When the SecB helix «2 swings outwards upon 
client binding (Fig. 4a), the movement not only widens the binding 
groove but also exposes additional non-polar and polar surfaces that 
are available for binding by the unfolded client. 

It should be noted that structure determination of isolated PhoA sites 
(PhoA’, PhoA‘S, PhoA4 and PhoA’; Fig. 1b) in complex with SecB shows 
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Figure 2 | Structure of the SecB—PhoA complex. Lowest-energy 
structure of the SecB—PhoA complex. SecB is shown as a space-filling 
model in grey. The five PhoA sites recognized by SecB are shown as 
space-filling models and coloured per the colour code in the graphic of 
the PhoA sequence at the top. The flexible regions of PhoA are shown as 

a pink ribbon. Four views of the complex are shown related by a rotation 
as indicated by the arrow. One PhoA molecule binds, which wraps around 
SecB. The NMR data show that the linkers tethering the binding sites in 
PhoA are flexible and do not interact with SecB (Extended Data Fig. 4a). 


that multiple molecules of the individual sites can be accommodated 
within a SecB tetramer, owing to their relatively short length (Extended 
Data Figs 3c and 5). 

NMR structure determination of MBP sites d and e in complex with 
SecB (SecB—MBP? and SecB—MBP® complexes; Extended Data Fig. 6 
and Extended Data Table 1) showed that MBP binds to SecB in a very 
similar fashion to PhoA. Thus, non-native proteins share a similar 
binding mode for SecB. Analysis of the NMR spectra of labelled full- 
length MBP in complex with SecB demonstrated that all seven binding 
sites in MBP (Fig. 1c) are engaged by SecB in the SecB—MBP complex 
(Extended Data Fig. 7a). NMR-driven modelling of the SecB—MBP 
complex (Methods) shows that MBP, similarly to PhoA, wraps around 
SecB using the chaperone’s entire binding surface (Extended Data 
Fig. 7b). Interestingly, the gain in avidity for MBP binding to SecB 
(Ka 0.05 M), compared with the isolated sites, appears to be an order of 
magnitude stronger than in the case of PhoA (Extended Data Fig. 3c, f). 
The reasons for the higher avidity are probably the larger interacting 
interface in the complex with MBP (~130 PhoA residues compared 
with ~240 MBP residues interacting with SecB) and the fact that the 
SecB-recognition sites in MBP are tethered with linkers that are much 
shorter in length than in the case of PhoA (Fig. 1b, c). Thermodynamic 
analysis reveals a large and favourable enthalpy of binding for both 
SecB—MBP and SecB—PhoA complexes, but with the overall affinity 
being reduced by unfavourable entropy of binding (Extended Data 
Fig. 3b, e). 

Amino-acid substitutions at the client-binding sites in SecB 
resulted in a substantial decrease in the affinity for unfolded proteins 
and a marked decrease of its antifolding activity (Extended Data 
Fig. 8a-c). 
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PhoA site e 


Figure 3 | Recognition of non-native PhoA by SecB. Expanded views of 
the SecB—PhoA complex highlighting the binding details and contacts 
that mediate recognition of the four PhoA sites (a, c, d and e) by SecB. The 
colour code of the PhoA sites, shown as ball-and-stick, is as in Fig. 2. SecB 
in the expanded views is shown as white ribbon and residues contacting 
PhoA are displayed as blue ball-and-stick. 


Chaperone-client binding mode modulates kinetics 

SecB may prevent the folding of a protein altogether, whereas other 
chaperones such as trigger factor (TF) cannot typically do so (Fig. 5a). 
We used surface plasmon resonance (SPR) and bio-layer interferometry 
(BLI) to measure the kinetics of interaction between unfolded MBP and 
PhoA with the SecB and TF chaperones (Fig. 5b and Extended Data 


a b 


PhoA site c 
PhoA site d 


Figure 4 | SecB structure adapts to client binding. a, Superposition of 
SecB structures (only subunit A is shown) in the unliganded state (blue) 
and bound to PhoA (pink). PhoA is not shown for clarity. The SecB helix 
«2 swings outward by ~50° upon PhoA binding. See also Extended Data 
Fig. 4e. b, Superposition of the structure of SecB subunits in complex with 
PhoA site c coloured in orange and with PhoA site d coloured in magenta. 
SecB is shown as a solvent-exposed surface in white. 
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Figure 5 | Effect of chaperone-client binding mode on kinetics and 
chaperone activity. a, Folding of urea-denatured MBP (pre form) in the 
absence of a chaperone (blue) and in the presence of SecB (purple) or 

TF (orange). Folding was monitored by Trp fluorescence at 23°C. SecB 
prevents the folding of MBP, whereas TF has a negligible effect. Both SecB 
and TF are in fourfold excess over MBP. b, Kinetic analysis by BLI of the 


Fig. 9a-f). Notably, MBP associates with SecB with an approximately 
tenfold higher rate (ky, 10°M~'s~') than with TF (kj, 10°M~'s~') 
and dissociates from SecB with an approximately fivefold slower rate 
(kore 0.01 8~') than from TF (kor 0.058 !). Of note, SecB prevents 
folding’ of the cytosolic pre-form of MBP (Fig. 5a and Extended Data 
Fig. 2c), but it cannot prevent folding of the mature (periplasmic) form 
of MBP, which lacks the amino (N)-terminal signal sequence (Extended 
Data Fig. 9g-i). This is because of the much faster intrinsic folding rate 
(ke) of the mature MBP (0.02 *) compared with the kr of the pre-form 
of MBP (0.003 s~!) (Extended Data Fig. 9g, h). Interestingly, an MBP 
variant® with much slower folding rate (kr 0.0008 s~') allows even TF 
to delay its folding (Extended Data Fig. 9j), highlighting the importance 
of the kinetics of client intrinsic folding and binding to the chaperone. 


SecB rescues aggregation-prone folded proteins 

To understand how SecB rescues cytosolic proteins® and increases the 
yield of natively folded proteins (Extended Data Fig. 9h), we used the 
aggregation-prone MBP©??""? (hereafter MBP™™) variant” that has a 
high tendency to aggregate, especially at temperatures higher than 30°C. 
Notably, in the presence of SecB, NMR shows that MBP™"' remains 
folded and soluble even at temperatures as high as 50°C (Extended 
Data Fig. 8d, e). At such high temperatures NMR showed that SecB 
binds to and shields the transiently exposed unfolded state of MBP™, 
resulting in its protection from aggregation (Extended Data Fig. 8f, g). 
The aggregation-prone, transiently populated conformation of the 
otherwise folded MBP™" that is protected by SecB is only partly 
unfolded and dissociates rapidly from SecB, giving rise to an anti- 
aggregation effect as opposed to an antifolding effect. 


Conclusions 

The present data demonstrate how the distinctive binding mode of 
SecB for non-native proteins (Fig. 2 and Extended Data Fig. 7b) ena- 
bles the chaperone to prevent folding of bound proteins (Fig. 5a). 
Compared with TF (Fig. 5c, d), a chaperone for which its structure in 
complex with full length PhoA is known”®, the structural data explain 
how the overall architecture of the chaperone and the way it engages 
non-native proteins give rise to different chaperone activities (Fig. 5a). 


binding of MBP to SecB (left) and TF (right). c, Structure of SecB—PhoA 
and d, TF—PhoA complex”’. In both structures, the chaperone and PhoA 
are rendered as in Fig. 2. TF can only accommodate ~50 interacting PhoA 
residues per TF molecule, whereas one SecB molecule can accommodate 
the entire PhoA. 


Although both SecB and TF prevent aggregation and misfolding, as 
most molecular chaperones do, SecB has a much stronger antifolding 
activity than TE Each TF molecule can accommodate a stretch of up to 
~50 interacting residues of an unfolded polypeptide, whereas SecB 
can accommodate as many as ~250 interacting residues (Fig. 5c, d and 
Extended Data Fig. 7b). Because SecB recognizes and binds to multiple 
regions within an unfolded protein, long client proteins wrap around 
SecB to maximize the binding interface, thereby altering the binding 
kinetics. The overall binding architecture appears to be unique among 
known protein-protein complexes. More structural data on complexes 
of chaperones with proteins” are needed to discover the full repertoire 
of binding architectures and how they influence chaperone activity. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized. The investigators were not blinded to allocation during 
experiments and outcome assessment. 

Expression and preparation of proteins. The E. coli SecB gene was cloned into the 
pET-16b vector (Novagen) containing a His¢-tag and a tobacco etch virus (TEV) 
protease cleavage site at the N terminus. Protein samples of E. coli PhoA were 
produced as described before”’. All E. coli MBP constructs were cloned into the 
pET-16b vector containing a Hiss-tag and a TEV protease cleavage site at the 
N terminus. The following MBP constructs were prepared in this study (residue 
numbers of the boundaries are in superscript): MBP!~°°°, mature MBP?7->"°, 
MBP22-°9, MBP*7-°9, MBP?7-164, MBpP!60-201 | MBP!98-265, MBP260-336, MBP?2!-396, 
and the MBP variants MBP@2P/33P, WBPY283P and MBPV89/¥283P (MBP mutants 
are numbered on the basis of the amino-acid sequence of the mature form of 
MBP). All constructs were transformed into BL21(DE3) cells. Isotopically 
unlabelled protein samples were produced in cells grown in Luria-Bertani (LB) 
medium at 37°C in the presence of ampicillin (100,1g ml‘) to an absorbance at 
600 nm (A600 nm) ¥ 0.8. Protein induction was induced by the addition of 0.2 mM 
isopropyl-3-p-1-thiogalactopyranoside (IPTG) and cells were allowed to grow 
for 16h at 18°C. Cells were harvested at Agoo nm ¥ 1.5 and resuspended in lysis 
buffer (50 mM Tris-HCl, 500 mM NaCl, pH 8 and 1mM PMSF). Cells were 
disrupted by a high-pressure homogenizer and centrifuged at 50,000g. Proteins 
were purified using Ni Sepharose 6 Fast Flow resin (GE Healthcare), followed by 
tag removal by TEV protease at 4°C (incubation for 16h) and gel filtration using 
Superdex 75 16/60 or 200 16/60 columns (GE Healthcare). Protein concentra- 
tion was determined spectrophotometrically at 280 nm using the corresponding 
extinction coefficient. 

MALS experiments. MALS was measured using DAWN HELEOS-II (Wyatt 
Technology Corporation) downstream of a Shimadzu liquid chromatography 
system connected to a Superdex 200 10/300 GL (GE Healthcare) gel filtration 
column. The running buffer for SecB—PhoA complexes was 20 mM KPi (pH 7.0), 
100mM KCl, 4mM BME, and 0.5 mM EDTA, whereas for SecB—MBP complexes 
was 20mM HEPES, pH 7, 150mM KOAc and 0.05% NaN3. Protein samples at a 
concentration of 0.05-0.2 mM were used. The flow rate was set to 0.5ml min! 
with an injection volume of 200 1] and the light scattering signal was collected at 
room temperature (~23 °C). The data were analysed with ASTRA version 6.0.5 
(Wyatt Technology Corporation). 

ITC experiments. ITC was performed using an iTC200 microcalorimeter (GE 
Healthcare) at temperatures ranging from 4°C to 25°C. All protein samples 
were extensively dialysed against the ITC buffer containing 50 mM KPi (pH 7.0), 
50mM KCI, 0.05% NaN; and 2 mM tris(2-carboxyethyl)phosphine (TCEP). All 
solutions were filtered using membrane filters (pore size, 0.45 um) and thoroughly 
degassed for 20 min before the titrations. The 40-1] injection syringe was filled with 
~0.05-1mM of SecB solution and the 200-1 cell was filled with ~0.01-0.2 mM 
PhoA or MBP. To measure the binding affinity of MBP to SecB, the slowly 
folding MBPV8S"83? variant was used to measure the affinity of MBP for SecB. 
MBPY8C/¥283D was unfolded in 8M urea, 20mM HEPES, pH 7, 150mM KOAc 
and 0.05% NaN3, and diluted 20 times to give a final concentration of 2.7 |1M 
immediately before loading into the cell. The solution containing SecB was 
precisely adjusted to match the urea concentration. The titrations were performed 
with a preliminary 0.2-\1] injection, followed typically by 15 injections of 2.5 il 
each with time intervals of 3 min. The solution was stirred at 1,000 rp.m. Data 
for the preliminary injection, which are affected by diffusion of the solution 
from and into the injection syringe during the initial equilibration period, 
were discarded. Binding isotherms were generated by plotting heats of reaction 
normalized by the modes of injectant versus the ratio of total injectant to total 
protein per injection. The data were fitted with Origin 7.0 (OriginLab 
Corporation). 

Protein isotope labelling for NMR studies. Isotopically labelled samples 
for NMR studies were prepared by growing the cells in minimal (M9) 
medium. Cells were typically harvested at Agoo nm * 1.0. U-[?H,8C,4N]- 
labelled samples were prepared for the backbone assignment of SecB and 
large MBP fragments by supplementing the growing medium with '"NH,Cl 
(1g 1-1) and 7H;7,'3C,-glucose (2g 1~!) in 99.9% 7HO (CIL and Isotec). 
The 'H-'C methyl-labelled samples were prepared as described???" 
a-Ketobutyric acid (50 mg I~!) and a-ketoisovaleric acid (85 mg 1”) 
were added to the culture 1h before the addition of IPTG. Met-[!CH3]- and 
Ala-[!°CHs]-labelled samples were produced by supplementing the medium 
with [1CH3]-Met (50mg 1-1) and [7H2,!"CHs]-Ala (50mg1~!). For Thr labelling, 
a Thr-auxotrophic cell strain was used, and the medium was supplemented 
with [?H2,'3CH3]-Thr (25 mg 1~'). For Phe, Tyr, and Trp labelling, U-[!H,'°C]- 
labelled amino acids were used. Alternative '*C-labelling of aromatic residues 
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was performed as described*”. All precursors and amino acids were added to the 
culture 1h before the addition of IPTG, except Ala, which was added 30 min before 
induction. 

NMR spectroscopy. NMR samples were typically prepared in 50 mM KPi 
(pH 7.0), 50 mM KCl, 0.05% NaN3, 5mM BME and 7% D2O. NMR experiments 
were recorded on Bruker 900, 850 and 700 MHz spectrometers. NMR spectra 
were typically recorded at 10°C for the isolated PhoA and MBP fragments and 
at 35°C for SecB and its complexes. Protein sample concentration ranged from 
0.1 to 1.0mM. All NMR spectra were processed using NMRPipe*™ and analysed 
using NMRView (http://www.onemoonscientific.com). 

NMR assignment of SecB. The SecB tetramer packs as a dimer of dimers and 
gives rise to two pairs of magnetically equivalent subunits: A and D give one set 
of resonances and subunits B and C give another set of resonances (Extended 
Data Fig. 1a). Sequential backbone assignment of SecB was achieved by the use 
of standard triple-resonance NMR pulse sequences. Three-dimensional (3D) 
'H-1N NOESY experiments were used to confirm and extend the backbone 
assignment within each subunit. Side-chain assignment for methyls and aromatic 
residues was accomplished using the following NMR experiments: 3D (‘H)-C 
heteronuclear multiple-quantum coherence (HMQC)-NOESY-'H-¥C HMQC, 
13C-edited NOESY-HSQC, #C-edited HSQC-NOESY, !°N-edited NOESY-HSQC, 
3D (‘H)-8C HSQC-NOESY-'H-!=N HSQC, and 3D (H)-!"N HSQC-NOESY- 
"H-C HSQC. 

NMR assignment of PhoA and MBP in the unfolded state. We previously 
described the assignment strategy for unfolded PhoA”’. We followed a similar 
strategy to assign MBP in the unfolded state by making use of several MBP 
fragments that remain soluble and unfolded when isolated (Extended Data 
Fig. 1c): MBP2?-°9, MBP*-°9, MBP?7-!64, MBp!60-201 | MBP!98-265, MBP260-336 and 
MBP**!~*°®. Isolated MBP fragments encompassing the first 26 N-terminal residues 
(signal sequence) were not stable and this region could only be assigned in complex 
with SecB. Overlay of the spectra of the MBP fragments with the spectra of full- 
length MBP in 4M urea indicated very good resonance correspondence. This is 
expected because all of the fragments, as well as the MBP, in 4 M urea are unfolded. 
Resonance assignment obtained for the various fragments was transferred to full 
length MBP in urea, and ambiguities were resolved by the use of 3D NMR spectra. 
It should be noted that although resonance dispersion in unliganded PhoA and 
MBP is poor, complex formation with SecB alleviates this problem (for the PhoA 
and MBP residues in the SecB-binding regions) with the spectra being of high 
resolution (Extended Data Fig. 4c). 

Structure determination of SecB—PhoA and SecB— MBP complexes. 
Assignment of the resonances in SecB—PhoA was accomplished by first assigning 
the complexes between SecB and the individual PhoA sites (SecB—PhoA*, SecB— 
PhoA‘, SecB—PhoA‘, SecB—PhoA‘). We used U-!2C,!°N-labelled samples that 
contained specifically protonated methyl groups of Ala, Val, Leu, Met, Thr and Ile 
(51) and protonated aromatic residues Phe, Tyr and Trp in an otherwise deuterated 
background. The high sensitivity and resolution of the methyl region, combined 
with the high abundance of these nine amino acids in SecB (Extended Data Fig. 1a) 
and in the SecB-binding sites of PhoA and MBP, provided a large number of 
intermolecular NOEs for the SecB—PhoA and SecB—MBP complexes (Extended 
Data Table 1). Because PhoA in complex with SecB provided higher quality spectra 
than the spectra of MBP in complex with SecB, we determined first the structure 
of the SecB—PhoA complex (~120kDa) by NMR. We initially characterized the 
structure of the each PhoA site (a—e) individually in complex with SecB (Extended 
Data Fig. 5). The structures of SecB—PhoA’, SecB—PhoA‘, SecB—PhoA4, and 
SecB—PhoA*, were determined by NMR and are presented in Extended Data Fig. 5. 
A large number of intermolecular NOEs were collected for each one of the 
complexes (Extended Data Table 1). Because of the relatively short length of the 
polypeptides encompassing the individual PhoA sites, multiple PhoA molecules 
bound to SecB, as shown in Extended Data Fig. 5. We also note that we detected 
the presence of a small number of intermolecular NOEs that were suggestive of 
alternative conformations of the PhoA sites bound to SecB. However, the intensity 
of these sets of NOEs was much weaker, indicating that the population of such 
alternative complexes is low. To solve the structure of the SecB—PhoA complex, we 
sought to determine how each one of the PhoA sites binds to SecB in the context 
of the full length PhoA. To circumvent the signal overlap in this large complex, we 
used samples where the two proteins were isotopically labelled in different amino 
acids. For example, in one of these samples SecB was labelled in the methyls of 
Leu, Val and Met, whereas PhoA in the methyls of Ile amino acids. Because of the 
distinct chemical shifts of 'H and '3C resonances of the methyls and the isotope 
labelling scheme, it was possible to measure specific intermolecular NOEs between 
SecB and PhoA (Extended Data Fig. 4b). Several of these samples were used to 
determine as many intermolecular NOEs as possible. As expected, the NOEs were 
compatible with the structure of each PhoA site in complex with SecB, with the 
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crucial difference that only one PhoA molecule could be accommodated in SecB. 
Owing to its short length, the isolated PhoA site b (PhoA”) binds to almost all of the 
exposed hydrophobic surface of SecB, as determined by NMR. In the SecB—PhoA 
complex with SecB, PhoA site b can only bind to the secondary binding site, as 
determined by NOEs. To further corroborate the structure of the SecB—PhoA 
complex we used PRE data (see below). The PRE-derived distances were fully 
compatible with the NOE data collected on SecB—PhoA. The structure of the 
SecB—PhoA complex was determined using the set of intermolecular NOEs 
collected directly in the complex and further refined using the intermolecular 
NOEs collected for the corresponding isolated PhoA sites in complex with SecB. 
It should be noted that because of the symmetry in SecB, the various PhoA sites 
may bind to any of the four SecB subunits. The final arrangement will be dictated 
by the length of the linkers tethering the SecB-recognition sites (as shown in 
Fig. 2), namely how far nearby recognition sites can bind from each other, and 
thus alternative routes of the polypeptide bound to SecB may be present. The only 
conceivable difference among the various conformations is the relative disposition 
of the PhoA sites. In all cases all of the SecB-recognition sites in PhoA are engaged 
by SecB in the complex and PhoA wraps around SecB. The NMR-driven structural 
model of the SecB—MBP complex (Extended Data Fig. 7b) was determined as 
follows: NMR analysis demonstrated that all seven recognition sites in MBP 
(labelled a—g) are bound to SecB in the SecB—MBP complex (Extended Data 
Fig. 7a). We have determined the high-resolution structure of MBP“ and MBP* 
in complex with SecB (Extended Data Fig. 6). Because of their length and the 
short linker tethering the two sites, d and e sites most probably bind to the same 
side of SecB. MBP site fis the longest one, consisting of ~90 residues, and is thus 
entirely accommodated on the other side of SecB. With sites d, e and f occupying 
the primary binding sites, the other recognition sites (a, b, c and g), being much 
shorter, can be accommodated within the secondary client-binding sites on SecB. 
The structure of MBP sites d and e in complex with SecB was determined using 
the experimental intermolecular NOE data. The hydrophobic residues of the sites 
a, b, c, f, and g, showing the strongest effect upon SecB binding as determined 
by differential line broadening, were used to drive the docking of these sites to 
non-polar residues on SecB. The modelled structure shows that the entire MBP 
sequence can be accommodated within one SecB molecule. The structures 
of SecB in complex with PhoA and MBP were calculated with CYANA 3.97 
(ref. 34), using NOE peak lists from 3D (1H)-'7C HMQC-NOESY-'H-#C HMQC, 
3D (1H)-!9N HSQC-NOESY-'H-#C HSQC, !3C-edited NOESY-HSQC, and 
ISN -edited NOESY--HSQC. The !7Ca, !°C, °C’, °N and NH chemical shifts 
served as input for the TALOS+ program* to extract dihedral angles (p and 7). 
The side chains of SecB residues within or nearby the PhoA and MBP binding sites 
were set flexible and their conformation was determined using intermolecular 
NOEs collected for each one of the complexes. The SecB regions remote to 
the binding sites were set rigid using the crystal structure coordinates for 
E. coli SecB*®. The 20 lowest-energy structures were refined by restrained 
molecular dynamics in explicit water with CNS*°. The percentage of residues 
falling in favoured and disallowed regions, respectively, of the Ramachandran 
plot is as follows: 99.4% and 0.6% for SecB-PhoA; 99.4% and 0.6% for SecB- 
PhoA?; 99.3% and 0.7% for SecB-PhoA‘; 99.2% and 0.8% for SecB-PhoA‘; 99.3% 
and 0.7% for SecB-PhoA‘; 99.4% and 0.6% for SecB-MBP%; and 99.4% and 0.6% 
for SecB-MBP*. 

PRE experiments. PRE experiments were used to confirm the position of each 
individual PhoA binding site in the SecB—PhoA complex. First, a ‘Cys-free’ variant 
of PhoA was prepared by mutating the four naturally occurring Cys residues in 
PhoA (Cys190, Cys200, Cys308 and Cys358) to Ser. We then introduced a Cys 
residue to either end of each SecB-binding site in PhoA and prepared a total 
of ten single-Cys mutants: T5C, T23C, K65C, M75C, G91C, G140C, Q274C, 
C308, N450C and C472. The protein purified from Ni-NTA column was quickly 
concentrated and loaded onto HiLoad 16/60 Superdex 200 gel filtration column 
(GE healthcare) using a buffer containing 50 mM KPi (pH 7.0), 150 mM NaCl 
and 0.05% NaN3. Immediately after elution the purified single-Cys PhoA 
mutant was divided into two equal portions for parallel treatment with (1-oxyl- 
2,2,5,5-tetramethyl-3-pyrroline-3-methyl)-ethanethiosulfonate (MTSL, Toronto 
Research Chemicals, Toronto) and a diamagnetic MTSL analogue, in a tenfold 
molar excess at 4°C for 16-20h. MTSL was prepared in a 50 mM concentrated 
stock in acetonitrile. Free MTSL was removed by extensive buffer exchange using 
Centricon Centrifugal Filter with a MWCO of 10,000 (Millipore) at 4°C. The 
MTSL-labelled PhoA protein samples were then concentrated and added into 
the 7H-methyl-!*CHs-labelled SecB at a final molar ratio of PhoA:SecB = 1:1. 
2D 1H,8C HMQC spectra were recorded at 28°C. A sample of SecB in complex 
with PhoA cross-linked to a diamagnetic MTSL analogue was used as a reference. 
Residues experienced significant NMR signal intensity reduction (>50% intensity 


loss) were identified as sites being within 20 A of the paramagnetic centre whereas 
residues experiencing more than 90% intensity loss were identified as sites being 
within 14 A of the paramagnetic centre. 

Protein folding assays. Refolding experiments of MBP were performed as 
described before*” with some modifications. Briefly, MBP was first denatured in 
8M urea, 100mM HEPES, 20mM KOAc, 5mM Mg(OAc),, pH 7.4, and 0.05% 
NaN. Refolding was initiated by rapid dilution (20 times dilution) in the urea-free 
buffer and the refolding process of MBP in the absence and presence of SecB or 
TF was monitored by the change in the intrinsic Trp fluorescence. Fluorescence 
intensity was measured using either a spectrofluorometer (FluoroMax-4, Horiba) 
or a microplate reader (Infinite 200 PRO, Tecan). The excitation and emission 
wavelengths were set to 295 nm and 345 nm, respectively. For measurement using 
the FluoroMax-4 instrument, the MBP concentration in the 1-ml cuvette was 
0.44.M, whereas for the microplate reader experiments the concentration of MBP 
was 4\.M in the 30,11-plate well. All fluorescence measurements were performed 
at 25°C. Data were fitted by the Prism 6 (GraphPad) software using the nonlinear 
regression analysis equation®®. 

Surface plasmon resonance (SPR). All SPR experiments were performed on 
a Biacore T200 system (GE Healthcare) using a NTA-coated Sensor Chip NTA 
(GE Healthcare) at a flow rate of 50,11 min '. The PhoA protein sample used 
for SPR experiments was genetically constructed with a His) -tag at the carboxy 
(C) terminus and a flexible (Gly-Ser); linker repeat inserted in between to avoid 
steric hindrance. A single-cycle kinetic procedure was used to characterize the 
interaction of SecB and PhoA. The His-tagged PhoA was immobilized onto a 
NTA sensor chip, followed by washing with the running buffer containing 
50mM phosphate, 50mM KCl, pH 7, 0.05% NaN3, and 2mM TCEP. The reduc- 
ing agent (TCEP) ensured that PhoA was in the unfolded state”’. SecB (analyte) 
at a range of concentrations (0.1-25.6 1M) was injected, and data for a period 
of 30s of association and 60 of dissociation were collected. MBP was prepared 
with a His)9-tag at the N terminus followed by a flexible nine-residue linker to 
avoid steric hindrance. Multiple-cycle kinetic analysis was performed for the SPR 
experiments of the binding between MBP and SecB where each sample 
concentration was run in a separate cycle, and the surface was regenerated between 
each cycle using NTA regeneration buffer. His-tagged MBP was denatured in 8M 
urea and immobilized onto a NTA sensor chip. Urea was quickly washed away by 
running buffer containing 20 mM HEPES, pH 7.4, 150mM KOAc and 0.05% NaN3. 
SecB was injected at concentrations ranging from 2.5nM to 1.61M. The association 
and dissociation time for data collection was set as 90s and 120s, respectively. After 
urea was removed, MBP remained in the unfolded conformation for sufficient time 
to interact with SecB. This was confirmed by monitoring the refolding behaviour of 
MBP using an Infinite 200 PRO microplate reader (Tecan) at the temperature range 
of the experiments. All SPR experiments were repeated three times and highly 
reproducible data were obtained. The sensorgrams obtained from the assay channel 
were subtracted by the buffer control, and data were fitted using the Biacore T200 
evaluation software (version 1.0). 

Bio-layer interferometry (BLI). BLI experiments were performed using an Octet 
system (forteBIO) at room temperature (~23 °C). MBP was biotinylated using 
the biotination kit EZ-Link NHS-PEG4-Biotin (Thermo Fisher Scientific). Biotin 
label freshly dissolved in water was added to the protein solution to a final molar 
ratio of 1:1 in buffer containing 50 mM KPi, pH 7, 150mM NaCl, 0.05% NaN3, 
and the solution was mixed at room temperature for 45 min. Unlabelled biotin 
label was removed by extensive buffer exchange using Centricon Centrifugal 
Filter with a MWCO of 10,000 (Millipore) at 4°C using a buffer containing 20 mM 
HEPES (pH 7), 150mM KoAc and 0.05% NaN3. Biotin-labelled MBP (200 nM) 
denatured in 8 M urea was immobilized onto the streptavidin (SA) biosensor, and 
the biosensors were subsequently blocked with biocytin in 8 M urea solution before 
a quick 30s dip into the urea-free buffer. SecB or TF previously diluted was applied 
in a dose-dependent manner to the biosensors immobilized with MBP. Bovine 
serum albumin (BSA) powder (Sigma-Aldrich) was added to a final concentration 
of 2% to avoid non-specific interaction. Parallel experiments were performed 
for reference sensors with no MBP captured and the signals were subsequently 
subtracted during data analysis. The association and dissociation periods were set 
to 2min and 5 min, respectively. 
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Extended Data Figure 1 | NMR characterization of SecB and unfolded sequence of MBP. d, Secondary structure propensity (SSP) values?” 


MBP. a, SecB is enriched in hydrophobic amino acids, such as methyl- of unfolded MBP (extracted collectively from the fragments) plotted 
bearing (Ala, Ile, Leu, Met, Thr and Val) and aromatic (Phe and Tyr). as a function of the amino-acid sequence. A SSP score at a given residue 

b, 'H-5N TROSY HSQC (left) and 'H-C methyl HMQC (right) spectra of 1 or —1 reflects a fully formed «-helical or B-structure (extended), 

of [U-7H; Ala-!3CH3; Met-!3CH3; Ile-§1-!°CH3; Leu, Val-!3CH3/!3CH3; respectively, whereas a score of, for example, 0.5 indicates that 50% of the 
Thr-!°CH3]-labelled SecB. SecB packing gives rise to two pairs of conformers in the native-state ensemble of the protein are helical at that 
spectroscopically equivalent subunits: one pair is formed by subunits position. The data show that several of the secondary structure elements in 
A and D, and the other pair by subunits B and C. Select assignment is the folded MBP retain some transient secondary structure in the unfolded 
included in the methyl spectrum with the asterisk indicating the other MBP fragments. 

pair. c, 'H-'°N HSQC spectra of select MBP fragments spanning the entire 
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Extended Data Figure 2 | NMR characterization of PhoA and MBP 
binding to SecB. a, To determine the SecB-recognition sites within PhoA 
and MBP, '°N-labelled PhoA and MBP fragments were titrated with 
unlabelled SecB. Owing to the labelling scheme and the size of SecB, the 
intensity of the PhoA and MBP residues that are bound by SecB decreases 
dramatically or disappears. Several titration points were recorded but here 
only the spectra for the SecB:PhoA and SecB:MBP 1:1 are shown for two 
select fragments. The 'H-'°N HSQC spectra of PhoA or MBP are shown 
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(c) refolding in the presence and absence of SecB monitored by 'H-!°N 
HSQC spectra. Spectra of the ‘refolded’ state were recorded after rapid 
dilution of urea-treated MBP/PhoA in native buffer. Spectra of the 
‘unfolded’ state were recorded in urea. MBP and PhoA refolded in their 
native structure in the absence of SecB but were retained in the unfolded 
state in the presence of SecB. 
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Extended Data Figure 3 | Energetics of SecB interaction with PhoA MBP fragments encompassing the seven (a-g) SecB-recognition sites and 
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complex showing a stoichiometry of 1:1. e, ITC of SecB binding to MBP but unfavourable entropy diminishes the overall binding. 


and the energetics of binding. f, Ky values for complexes between select 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


ARTICLE 


b 
G91 as9 ® 
@ 8G ge 
G76 469 G140 
14, 
110 ovtes ene 408 Boo 
e ” 111@G@453 G96 p= 
g Gt16 G134 \ 6 ) 
G453 ies 
Pace val 129, got SecB L42 SecB L128 SecB L136 
a2 571296 PhoA I6 PhoA I6 PhoA I71 
2. 
v& ) 
<n 
> = SecB M9 
Ee E PhoA 197 \ oy 
: a na 
Ee 120 a SecB V31 
S a PhoA 1301 
7 g 


SecB L42 
PhoAI6 


te 
ey oD 


-— 


f o@2 


~— 


SecB L128 
PhoA I6 


PhoA unfolded 
PhoA+SecB 


SecB-PhoA? SecB-PhoA® 


10 05 00 
'H (p.p.m.) 


1.5 


0.0 


0.5 


2.0 15 


0.0 


2.0 15 1.0 0.5 
'H (p.p.m.) 


SecB L126 SecBM9 SecBA95 SecB M94 SecB 189 e 


PhoA‘ 197 = PhoA° PhoA® apo SecB_ SecB-PhoA? 


Za 


SecB L98 SecB M94 = SecB L128 
PhoA?V18 — PhoA? 16 PhoA? I6 PhoA° 


d SecB 186 SecB 189 
PhoA?V18_ PhoA?V18 — PhoA? V18 


PhoA° 197 |} PhoA‘ 197 |} PhoA‘ 197 


ons ons 


Extended Data Figure 4 | See next page for caption. 
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Extended Data Figure 4 | NMR characterization of the SecB—PhoA 
complex. a, 'H-'°N TROSY HSQC spectra of PhoA in the unfolded state 
(light blue) and in complex with SecB (grey). The unfolded state was 
induced by the addition of reducing agent” or urea and assigned and 
characterized by NMR as shown before’. Select resonance assignment of 
SecB-recognition sites in PhoA is included (the colour is per the colour 
code for each SecB-recognition site within PhoA; see Fig. 1b). There is an 
excellent correspondence between the PhoA residues identified to bind to 
SecB using the various PhoA fragments (Extended Data Fig. 2a) and the 
residues of full-length PhoA that are bound to SecB in the SecB—PhoA 
complex. All five SecB-recognition sites in PhoA (a-e) are engaged by 
SecB in the SecB—PhoA complex. The PhoA regions that are not bound 
to SecB (they retain their intensity in the complex) are all in an unfolded 
conformation as suggested by their essentially identical chemical shifts to 
the unfolded PhoA. b, Select strips from '*C-edited NOESY experiments 
highlighting intermolecular NOEs in the SecB—PhoA complex. Owing to 
severe resonance overlap in the 120 kDa SecB—PhoA complex, to identify 
specific intermolecular NOEs we prepared samples wherein the two 


protein partners are labelled in different methyl-bearing type of amino 
acids. In this example, SecB was labelled in Leu, Met and Val residues and 
PhoA in Ile residues. Thus, all NOEs detected between Leu/Val/Met and 
Ile methyls are intermolecular. c, 'H-!°C methyl HMQC spectra of SecB 
in complex with PhoA fragments carrying the individual PhoA sites: 
PhoA‘ (green), PhoA‘ (orange), PhoA4 (magenta) and PhoA‘S (red). 

Both SecB and PhoA fragments are [U-H; Ala-'°CH3; Met-3CH3; 
Ile-81-!°CH3; Leu, Val-!°CH3/!°CH3; Thr-!3CH3]-labelled. 

d, Representative strips from '*C-edited NOESY-HSQC and HMQC- 
NOESY-HMQC NMR experiments. The NOE cross-peaks between 

SecB and residues of PhoA fragments are designated by a dashed-line red 
circle. e, Characteristic NOEs showing that the primary binding groove in 
SecA is enlarged by the displacement of helix «2 as shown in Fig. 4a. For 
example, the NOE between SecB residues Ala95 and Phe137 is consistent 
with the closed conformation observed in apo SecB. This NOE is not 
present in the SecB—PhoA complex because the two SecB residues have 
moved apart as a result of the displacement of the helix a2. 
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Step 1 


NMR assignment of free SecB and unfolded PhoA 


Step 2 


NMR assignment and structure determination of the complexes between SecB and the 
individual SecB-recognition sites of PhoA (sites a, c, d, and e) 


Y: 4 
SecB-PhoA? SecB-PhoA° SecB-PhoA? SecB-PhoA* 
Four PhoA? molecules Two PhoA* molecules Four PhoA? molecules Four PhoA? molecules 
bind to SecB bind to SecB bind to SecB bind to SecB 


SecB can accommodate multiple molecules of short substrates. The stoichiometry is determined by the number 
of PhoA-site molecules required to occupy all of the primary substrate-binding sites in SecB. 


Step 3 


To determine the structure of SecB in complex with full length PhoA, we first assigned the SecB and the SecB-bound 
PhoA resonances also assisted by the NMR information obtained for the individual complexes in Step 2. Next, we 
identified unique inter-NOE patterns by using several SecB—PhoA samples that were differentially labeled. These 
inter-NOEs were sufficient to position the sites of full length PhoA on SecB. The final structure was refined by incorporating 
a large number of NOEs collected for the individual complexes from Step 2. 


Extended Data Figure 5 | Strategy for the structure determination of the SecB—PhoA complex. The three main steps are briefly described here. 
More details can be found in Methods. The lowest-energy NMR structures of the SecB complexes with the individual PhoA sites a, c, d and e are shown. 
The structural and NMR statistics for each structure are shown in Extended Data Table 1 and Methods. 
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Extended Data Figure 6 | Structures of SecB with MBP sites. a, Lowest- white cartoon (right). Expanded views (right) of the contacts between 
energy structure of SecB in complex with a MBP fragment encompassing SecB and MBP. The SecB residues mediating contacts with MBP are shown 
site d (MBP%, residues 105-152). b, Lowest-energy structure of SecB in as blue ball-and-stick. In both complexes an additional MBP molecule 


complex with a MBP fragment encompassing site e (MBP*, residues binds symmetrically to the opposite face of SecB but are not shown for 
165-210). SecB is shown as grey solvent-accessible surface (left) or as clarity. 
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Extended Data Figure 7 | See next page for caption. 
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Extended Data Figure 7 | NMR-driven model structure of SecB— MBP 
complex. a, 'H-'°N TROSY HSQC spectra of MBP fragments (grey), 
MBP fragments in complex with SecB (blue) and full-length MBP in 
complex with SecB (magenta). The Gly (left) and Trp Ne (right) regions 
are shown as examples because of the excellent dispersion and lack of 
severe resonance overlap. The various MBP fragments covering the entire 
MBP sequence (Extended Data Fig. 1c) are coloured grey and if they are 
located within a SecB-recognition site it is denoted in the superscript. 
The MBP residues that do not interact with SecB retain their intensity. 
These are residues located in regions that are not SecB-recognition sites 
(Fig. 1c). When these spectra are compared with the spectra of full- 
length MBP in complex with SecB (in magenta) a very good resonance 
correspondence is observed. Thus, two important observations can be 
made: first, all seven SecB-recognition sites (a—g) in MBP are engaged 

by SecB in the SecB—MBP complex; and, second, the MBP regions 

that do not interact with SecB in the SecB—MBP complex remain in an 
unfolded state. The Trp spectra (right) provide direct evidence in support 
of these observations: all Trp residues, with the exception of Trp155, are 
located in SecB-recognition sites and they all interact with SecB in the 
SecB—MBP complex. In contrast, Trp155 does not bind to SecB when 
the corresponding MBP fragment was used, and this was also the case for 


MBP. b, Modelled structure of the SecB—MBP complex. SecB is shown as 
a solvent-exposed surface and MBP as a pink ribbon. The seven MBP sites 
recognized by SecB are shown as side-chain surface and coloured per the 
colour code in the graphic of the MBP sequence at the top. The structure 
of the complex was modelled as detailed in Methods. Briefly, as mentioned 
above, NMR analysis demonstrated that all seven recognition sites in MBP 
(labelled a-g) are bound to SecB in the SecB—MBP complex. We have 
determined the high-resolution structure of MBP“ and MBP” in complex 
with SecB (Extended Data Fig. 6). Because of their length and the short 
linker tethering the two sites, d and e, most probably bind to the same side 
of SecB. MBP site fis the longest one, consisting of ~90 residues, and is 
thus entirely accommodated on the other side of SecB. With sites d, e and f 
occupying the primary binding sites, the other recognition sites (a, b, c 
and g), being much shorter, can be accommodated within the secondary 
client-binding sites on SecB. The structure of MBP sites d and e in complex 
with SecB was determined using the experimental intermolecular NOE 
data. The hydrophobic residues of the sites a, b, c, fand g showing the 
strongest effect upon SecB binding, as determined by differential line 
broadening, were used to drive the docking of these sites to non-polar 
residues on SecB. The modelled structure shows that the entire MBP 
sequence can be accommodated within one SecB molecule. 
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Extended Data Figure 8 | Anti-aggregation activity of SecB. a, A triple 
amino-acid substitution in the SecB (V40A/L42A/L44A) client-binding 
site was prepared and is referred to as the triple mutant SecB (SecB™). 
ITC profile of the binding of PhoA to SecB™ to be compared with 
PhoA binding to wild-type SecB (Extended Data Fig. 3b). The triple 
substitution causes a 40-fold reduction in the affinity of SecB for PhoA. 
b, Fluorescence-monitored MBP folding in the absence of SecB (blue), in 
the presence of wild-type SecB (green) and in the presence of SecB™ (red). 
The triple mutant diminishes significantly the antifolding activity of SecB. 
c, 'H->N TROSY HSQC spectra of MBP refolded in the absence (blue) 
and presence of SecB™ (red). In contrast to wild-type SecB (Extended Data 
Fig. 2c), SecB™ cannot hold MBP in the unfolded state. d, 1H-!°C methyl 
HMQC spectra of MBP™" (blue) and in the presence of SecB (red) recorded 
at 22°C. The MBP mutant (MBP™) carries two amino-acid substitutions 
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(G32D/I33P) that renders the protein prone to aggregation”, especially at 
temperatures above 30°C. No NMR signal of MBP™ can be detected at 
temperatures above 30°C and the protein precipitates in the NMR tube. At 
22°C, MBP™ is folded, as evidenced by the resonance dispersion in the 
NMR spectra, and does not interact with SecB. e, 'H-'°C methyl HMQC 
spectrum of MBP™" in the presence of SecB recorded at 50°C. MBP™" 
suffers heavy precipitation and aggregation at temperatures higher than 
30°C, but in the presence of SecB it is stable and folded even at temperatures 
as high as 50°C. f, 'H-!°N TROSY HSQC spectra of SecB (blue) and in 

the presence of MBP™ (orange) at 42°C, indicating binding. Because of 
the elevated temperature, a significant unfolded population of MBP™ 

is present, which binds to SecB (see main text). g, Mapping of the sites 
(orange) used by SecB to interact with MBP™, on the basis of the chemical 
shift perturbation data from the spectra in f. 
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Extended Data Figure 9 | Kinetics of PhoA and MBP interaction with 
SecB and TE. ac, SPR analysis of the interaction of SecB with PhoA 

(a) and MBP at 20°C (b) and 30°C (c). Single-cycle and multiple-cycle 
procedures were used for the SPR analysis of SecB with PhoA and MBP, 
respectively. d—f, BLI analysis of the binding of MBP to SecB (d), 
SecB™ (e) and TE (f). His-tagged PhoA or MBP (for SPR) or biotinylated 
MBP (for BLI) experiments was immobilized on an NTA chip (SPR) 

or streptavidin biosensor (BLI) and interactions were examined at 
different SecB or TF concentrations as indicated. Binding is reported 

in response units (RU) for SPR and wavelength shift (nanomteres) for 
BLI as a function of time. g, h, Effect of SecB on the kinetics of 

MBP folding. g, Fluorescence-monitored folding of MBP (pre form) and 
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mature MBP (h) in the absence (blue) and presence of one- (green) and 
fourfold (purple) excess of SecB. SecB does not appreciably delay folding 
of mature MBP. In fact, SecB excess appears to increase the yield of 
soluble, folded mature MBP (purple). i, 'H-!°N TROSY HSQC spectra of 
mature MBP refolded in the absence (blue) and presence of SecB (red). 
SecB cannot retain the mature MBP unfolded. j, Fluorescence-monitored 
folding of the slowly folding MBP*?*°? variant in the absence (blue), 

and presence of one- (green) and fivefold (orange) TF. As elaborated in 
the main text, TF does not delay folding of pre-MBP (Fig. 5a). However, 
it does delay folding of an inherently slowly folding MBP mutant 
(MBP*?83), thus highlighting the importance of the intrinsic folding of 
the client protein and its association rate to the chaperone. 
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Extended Data Table 1 | NMR and refinement statistics for the SecB complexes with PhoA and MBP 


SecB- 
PhoA 
NMR distance and dihedral constraints 
Distance restraints 
Total NOE 1362 
Inter-residue 
Sequential (|i-j| = 1) 343 
Non-sequential (|i-j] > 1 ) 1019 
Inter-molecule 171 
Total dihedral angle restraints 1169 
phi 583 
psi 586 
Structure statistics 
Violations (mean and s.d.) 
Distance constraints (A) 0.012 
+0.047 
Dihedral angle constraints (°) 0.42 
+1.4 
Max. dihedral angle violation (°) 26.8 
Max. distance constraint asa 
violation (A) 
Average pairwise r.m.s.d. (A) 
Heavy 44 
Backbone 4.0 


SecB- 
PhoA? 


2.1 
1.5 


SecB- 
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SecB- 
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1338 


24 
1.7 


SecB- 
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SecB- 
MBP® 


1446 


362 
1084 
22 
976 
488 
488 


0.016 
+0.054 
0.31 
+0.96 
9.9 
0.95 


2.9 
2.1 


Statistics for each structure were computed for the ensembles of 20 deposited structures. Ordered residue ranges (S(v) + S(w) > 1.8), 10-141 (of SecB subunits A, B, C and D); backbone (heavy atom) 
root mean squared deviation (r.m.s.d.) was ~1.0 (1.3) A within the specified range for all complexes. Additionally, the r.m.s.d. within the PhoA fragments is reported for each structure. Average distance 


constraint violations were calculated with PDBStat*?. 
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Compression and ablation of the photo-irradiated 
molecular cloud the Orion Bar 


Javier R. Goicoecheal, Jéréme Pety?*, Sara Cuadrado!, José Cernicharo!, Edwige Chapillon?*°, Asuncién Fuente®, 
Maryvonne Gerin*’, Christine Joblin®’, Nuria Marcelino! & Paolo Pilleri®” 


The Orion Bar is the archetypal edge-on molecular cloud surface 
illuminated by strong ultraviolet radiation from nearby massive 
stars. Our relative closeness to the Orion nebula (about 1,350 light 
years away from Earth) means that we can study the effects of stellar 
feedback on the parental cloud in detail. Visible-light observations 
of the Orion Bar! show that the transition between the hot ionized 
gas and the warm neutral atomic gas (the ionization front) is spatially 
well separated from the transition between atomic and molecular gas 
(the dissociation front), by about 15 arcseconds or 6,200 astronomical 
units (one astronomical unit is the Earth-Sun distance). Static 
equilibrium models”* used to interpret previous far-infrared and 
radio observations of the neutral gas in the Orion Bar*° (typically 
at 10-20 arcsecond resolution) predict an inhomogeneous cloud 
structure comprised of dense clumps embedded in a lower-density 
extended gas component. Here we report one-arcsecond-resolution 
millimetre-wave images that allow us to resolve the molecular 
cloud surface. In contrast to stationary model predictions’, there 
is no appreciable offset between the peak of the H, vibrational 
emission (delineating the H/H; transition) and the edge of the 
observed CO and HCO* emission. This implies that the H/H, and 
C*t/C/CO transition zones are very close. We find a fragmented 
ridge of high-density substructures, photoablative gas flows and 
instabilities at the molecular cloud surface. The results suggest that 
the cloud edge has been compressed by a high-pressure wave that 
is moving into the molecular cloud, demonstrating that dynamical 
and non-equilibrium effects are important for the cloud evolution. 

The Atacama Large Millimeter/submillimeter Array (ALMA) radio- 
telescope allows us to resolve the transition from atomic to molecular 


Orion nebula 
(M42) 


« Trapezium 
x cCluste 


lonized gas 


lonization fronts 


0.1 pe 
20,000 au 


Molecular gas 


gases at the edge of the Orion molecular cloud’®’, which is directly 


exposed to energetic radiation from the Trapezium stars (Fig. 1). The 
strong ultraviolet field drives a blister “H m regio (hot ionized hydro- 
gen gas or H*) that is eating its way into the parental molecular cloud. 
At the same time, flows of ionized gas stream away from the cloud 
surface at about 10km s~! (roughly the speed of sound cy , at 
T~10*K)!°!, The so-called photon-dominated or photodissociation 
region (PDR"; see Extended Data Fig. 1) starts at the H 1 region/cloud 
boundary where only far-ultraviolet radiation penetrates the ‘neutral’ 
cloud, that is, stellar photons with energies below 13.6 eV that cannot 
ionize H atoms but do dissociate molecules (H, + photon — H + H), 
and ionize elements such as carbon (C + photon > C* + electron). 
Inside the PDR, the far-ultraviolet photon flux gradually decreases due 
to dust grain extinction and H; line absorption, as do the gas and dust 
temperatures!*, These gradients produce a layered structure with dif- 
ferent chemical compositions as one moves from the cloud edge to the 
interior®’*. The ionized nebula (the H 1 region) can be traced by the 
visible light emission from atomic ions (such as the [S 11] 6,731 A elec- 
tronic line). The ionization front is delineated by the [O 1] 6,300 Aline 
of neutral atomic oxygen!» (Fig. 1). Both transitions are excited by 
high-temperature collisions with electrons. Therefore, their intensities 
sharply decline as the electron abundance decreases by a factor of 
about 10* at the H*/H transition layer. In Fig. 1b, the dark cavity 
between the ionization front and the HCOt-emitting zone is the neu- 
tral ‘atomic layer’ (x(H) > x(H2) >> x(H*), where x is the species abun- 
dance with respect to H nuclei). This layer is very bright in 
mid-infrared polycyclic aromatic hydrocarbon emission, and cools via 
the far-infrared O and C* emission lines'*. Although most of 


Figure 1 | Multiphase view of the 
Orion nebula and molecular cloud. 
a, Overlay of the HCOt J= 3-2 
emission (red) tracing the extended 
Orion molecular cloud. The hot 
ionized gas surrounding the 
Trapezium stars is shown by the 

[S u] 6,731 A emission (green). The 
interfaces between the ionized and the 
neutral gas, the ionization fronts, are 
traced by the [O 1] 6,300 A emission 
(blue). Both lines were imaged with 
VLT/MUSE". The size of the image 

is approximately 5.8’ x 4.6’. BN/KL, 
Becklin-Neugebauer/Kleinmann-Low 
star-forming region. b, Close-up of the 
Orion Bar region imaged with ALMA 


0.002 pc 


in the HCO* J= 4-3 emission (red). 
The black region is the atomic layer. 
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the electrons are provided by the ionization of C atoms (thus 
x(e~) x(C*) = 10~4)!*!6, the gas is mainly heated by collisions with 
energetic (about 1 eV) electrons photo-ejected from small grains and 
polycyclic aromatic hydrocarbons*"*. For the strong far-ultraviolet 
radiation flux impinging the Orion Bar*”, which is approximately 
4.4 x 10’ times the average flux in a local diffuse interstellar cloud!®, a 
gas density ny =n(H)+2n(H)2) of (4-5) x 10‘cm7? in the atomic layer 
is consistent with the observed separation between the ionization and 
dissociation fronts** 

ALMA resolves the sharp edge where the HCO* and CO emission 
becomes intense (Fig. 2). These layers spatially coincide with the bright- 
est peaks of H2 vibrational emission (H3) tracing the H/H): transition 
(Extended Data Fig. 2). Therefore, the H/H2 and the C*/C/CO transi- 
tion zones occur very close to each other. Static equilibrium models 
of a PDR with ny = (4-5) x 104cm~? predict*”!4, however, that the 
Ct/CO transition should occur deeper inside the molecular cloud 
because of the lower ionization potential of C atoms (11.3 eV) and 
because CO may not self-shield from photodissociation as effectively 
as H). The spatial coincidence of several H, and HCO* emission peaks 
shows that the formation of carbon molecules starts at the surface of 
the cloud (initiated by reactions of C* with H;). This shifts the C*/CO 
transition closer to the ionization front and suggests that dynamical 
effects are important!”®. 

To zero order, the CO J = 3-2 (where J is the rotational quantum 
number) line intensity peak oe 2 in K) is a measure of the gas tem- 
perature T in the molecular cloud (6x > 15” in Fig. 2c, where 6x is the 
distance to the ionization unt The HCOt J =4-3 integrated line 
intensity (wiCo" in Kkm s~’), however, scales with the gas density ny 


(see Methods and Extended Data Fig. 3). Although the toe * image 


shows a relatively homogeneous temperature distribution, the 
wicor image shows small-scale structure (Fig. 2a, b). In particular, 
ALMA resolves several bright HCO* emission peaks (filamentary 
substructures, some akin to globulettes) surrounding the dissociation 
front and roughly parallel to it. These substructures are surrounded by 
a lower-density gas component, with my ~ (0.5-1.0) x 10°cm~°, pro- 
ducing an extended (ambient) emission*°. The HCO* substructures 
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Approximate gas temperature 


Figure 2 | ALMA images of the Orion Bar. 

a, Integrated intensity of the HCO* J= 4-3 line. 
b, CO J= 3-2 line peak. Compared with Fig. 1, 
aand b have been rotated 127.5° anticlockwise to 
bring the incident ultraviolet radiation from the 
left (see Extended Data Fig. 1). The dashed curves 
and the vertical dot-dashed lines delineate the 
ionization and dissociation fronts, respectively’. 
c, Vertically averaged intensity cuts perpendicular 
wico" 


Teak (K) 
200 


to the Orion Bar in (blue curve) and 


CO3-2 
T 


peak (red curve). d, Probability distribution of 


wiico” (proportional to the gas density) in the 
observed field (magenta triangles) and in the 
compressed layers (black squares). 


Logarithmic emission, z 


(with a typical width of about 2” ~4 x 10-7 pc) are located at the molec- 
ular cloud edge, and are different to the bigger (5’-10”) condensations 
previously seen deeper inside the molecular cloud®””. 

To investigate the stratification of molecular emission inside the 
cloud, we constructed averaged emission cuts re to the 
Orion Bar. Three emission maxima are resolved in the W }°9 * crosscuts 
at roughly periodic separations of about 5” (approximately 0.01 pc; 
Fig. 2c). Excitation models show that the average physical conditions 
that reproduce the mean CO and HCO‘ intensities towards the disso- 
ciation front (at 6x15”) are T+ 200-300 K and ny (0.5- 
1.5) x 10°cm~? (see Methods and Extended Data Fig. 3). Hence, the 
over-dense substructures have compression factors of about 5-30 with 
respect to the ambient gas component and are submitted to high ther- 
mal pressures (P/k =nyT 2 x 108K cm~3). The three periodic max- 
ima suggest that a high-pressure compression wave exists, and is 
moving into the molecular cloud. This wave may be associated with an 
enhanced magnetic field (several hundred microgauss; see Methods). 

In the very early stages of an expansion of the H 11 region into 
molecular clouds, theory predicts that the ionization and dissociation 
fronts are co-spatial (an R-type front'*°). Soon after (¢< 1,000 yr), the 
expansion slows down and the dissociation front propagates ahead of 
the ionization front and into the molecular cloud'®”. The ionization 
front changes to a D-type front (a compressive wave travels ahead of 
the ionization front'®”° and the neutral gas becomes denser than the 
ionized gas). For a front advancing at a speed'”'* of 0.5-1.0km s~', the 
observed separation between the ionization and dissociation fronts in 
the Orion Bar implies a crossing time of 25,000-50,000 yr. Later in the 
expansion phase, when t is several times greater than the dynamical 
time fayn of the expanding H 11 region (the ratio of the initial radius of 
the H 11 region, the so-called the Strémgren radius, and cy), the com- 
pressive wave slowly enters into the molecular cloud?!” (tayn 0.2 pe 
per 10km s ! 20,000 yr for the Orion Bar). Observational evidence 
of such dynamical effects is scarce. 

In the compressed layers suggested by ALMA (where 6x is between 
7” and 30” in Fig. 2a), the distribution of the gas densities follows a 
relatively narrow log-normal distribution (Fig. 2d). This is consistent 
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with magnetohydrodynamic simulations of non-gravitating turbulent 
clouds?374, When the entire observed field is analysed, the shape of 
the distribution is closer to a double-peaked log-normal distribution. 
This resembles specific simulations in which the cloud compression 
is induced by the expansion of the ionized gas”*”° (and not by a strong 
turbulence). Searching for further support for this scenario, we inves- 
tigated the degree of turbulence and compared the different contri- 
butions of the gas pressure in the PDR (Extended Data Table 1). The 
inferred non-thermal (turbulent) velocity dispersion, about 1km sy 
results in a moderate Mach number of <1 (the ratio of the turbulent 
velocity dispersion to the local speed of sound)—that is, only a gentle 
level of turbulence. The thermal pressure exerted by the H 1 region 
at the H*/H interface’ is several times higher than the turbulent and 
thermal pressures in the ambient molecular cloud. These pressure dif- 
ferences, together with the detection of over-dense substructures close 
to the cloud edge, agree with the ultraviolet radiation-driven compres- 
sion scenario”>”°. Whether these substructures could be the seeds of 
future star-forming clumps (for example, by merging into massive 
clumps) is uncertain?*””. Gravitational collapse is not apparent from 
their density distribution (no high-density power-law tail”*°). Indeed, 
their estimated masses (less than about 0.005Mo; where Mo is the 
mass of the Sun) are much lower than the mass needed to make them 
gravitationally unstable. Even so, the increased ultraviolet shielding 
produced by the ridge of high-density substructures probably con- 
tributes to protecting the molecular cloud from photodestruction for 
longer periods. 

The ALMA images also show CO emission ripples”® along the sur- 
face of the molecular cloud (undulations separated by less than about 
5" 0.01 pc in Fig. 2b), which are indicative of instabilities at the 
dissociation front. Such small-scale corrugations resemble the ‘thin- 
shell instability produced by the force imbalance between thermal 
(isotropic) and ram (parallel to the flow) pressures”’. Characterizing 
these interface instabilities in detail would require new magneto- 
hydrodynamic models that include mesh-resolutions that are well 
below the 0.1-0.01 pc scales achieved in current simulations”> and 
include neutral gas thermochemistry. 

Finally, ALMA reveals fainter HCO*™ and CO emission in the atomic 
layer (HCO* globulettes and plume-like CO features at x < 15”, 
Fig. 2a, b). The dense gas HCO?* emission structures must have sur- 
vived the passage of the dissociation front*’, whereas the CO plumes 
may trace either warm CO that reforms in situ in the atomic layer or 
molecular gas that advects or photoablates”* from the surface of the 
molecular cloud. In the latter case, the pressure difference between the 
compressed molecular layers and the lower-density atomic layer would 
favour such a flow. Interestingly, molecular line profiles from the plumes 
typically show two velocity components, one of them identical to that of 
gas from inside the Orion Bar (Extended Data Fig. 4). This kinematic 
association supports the presence of photoablative flows through the 
atomic layer, and generally agrees with the suggested role of dynamical 
and non-equilibrium effects in ultraviolet-irradiated clouds. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 


Received 13 February; accepted 8 June 2016. 
Published online 10 August 2016. 


1. Walmsley, C. M., Natta, A., Oliva, E. & Testi, L. The structure of the Orion Bar. 
Astron. Astrophys. 364, 301-317 (2000). 

2. Tielens, A. G. G. M. & Hollenbach, D. J. Photodissociation regions. 

I: basic model. Astrophys. J. 291, 722-754 (1985). 

3. Andree-Labsch, S., Ossenkopf, V. & Réllig, M. 3D modelling of clumpy PDRs: 
understanding the Orion Bar stratification. Preprint at http://arxiv.org/ 
abs/1405.5553 (2014). 

4. Tielens, A. G. G. M. et al. Anatomy of the photodissociation region in the 
Orion Bar. Science 262, 86-89 (1993). 

5. Hogerheijde, M. R., Jansen, D. J. & van Dishoeck, E. F. Millimeter and 
submillimeter observations of the Orion Bar. 1: physical structure. 

Astron. Astrophys. 294, 792-810 (1995). 


LETTER 


6. Young Owl, R. C., Meixner, M. M., Wolfire, M., Tielens, A. G. G. M. & Tauber, J. 
HCN and HCO* images of the Orion Bar photodissociation region. Astrophys. J. 
540, 886-906 (2000). 

7. Sternberg, A. & Dalgarno, A. Chemistry in dense photon-dominated regions. 
Astrophys. J. Suppl. Ser. 99, 565-607 (1995). 

8. Le Petit, F, Nehmé, C., Le Bourlot, J. & Roueff, E. A model for atomic and 
molecular interstellar gas: The Meudon PDR code. Astrophys. J. Suppl. Ser. 164, 
506-529 (2006). 

9. Réollig, M. et a/. A photon dominated region code comparison study. Astron. 
Astrophys. 467, 187-206 (2007). 

10. Genzel, R. & Stutzki, J. The Orion molecular cloud and star-forming region. 
Annu. Rev. Astron. Astrophys. 27, 41-85 (1989). 

11. Goicoechea, J. R. et al. Velocity-resolved [Cil] emission and [Cil]/FIR mapping 
along Orion with Herschel. Astrophys. J. 812, 75 (2015). 

12. O'Dell, C. R. The Orion Nebula and its associated population. Annu. Rev. 
Astron. Astrophys. 39, 99-136 (2001). 

13. van der Werf, P. P. Goss, W. M. & O'Dell, C. R. Tearing the veil: interaction of 
the Orion Nebula with its neutral environment. Astrophys. J. 762, 101 (2013). 

14. Hollenbach, D. J. & Tielens, A. G. G. M. Photodissociation regions in the 
interstellar medium of galaxies. Rev. Mod. Phys. 71, 173-230 (1999). 

15. Weilbacher, P. M. et al. AMUSE map of the central Orion Nebula (M 42). 
Astron. Astrophys. 582, A114 (2015). 

16. Draine, B. T. Physics of the Interstellar and Intergalactic Medium (Princeton Univ. 
Press, 2011). 

17. Bertoldi, F. & Draine, B. T. Nonequilibrium photodissociation regions: 
ionization-dissociation fronts. Astrophys. J. 458, 222-232 (1996). 

18. Storzer, H. & Hollenbach, D. J. Nonequilibrium photodissociation regions with 
advancing ionization fronts. Astrophys. J. 495, 853-870 (1998). 

19. Lis, D. C. & Schilke, P. Dense molecular clumps in the Orion Bar 
photon-dominated region. Astrophys. J. 597, L145-L148 (2003). 

20. Spitzer, L. Physical Processes in the Interstellar Medium (Wiley, 1978). 

21. Hill, J. K. & Hollenbach, D. J. Effects of expanding compact H II regions 
upon molecular clouds: molecular dissociation waves, shock waves, and 
carbon ionization. Astrophys. J. 225, 390-404 (1978). 

22. Hosokawa, T. & Inutsuka, S.-i. Dynamical expansion of ionization and 
dissociation front around a massive star. Il: on the generality of triggered 
star formation. Astrophys. J. 646, 240-257 (2006). 

23. Hennebelle, P. & Falgarone, E. Turbulent molecular clouds. Astron. Astrophys. 
Rev. 20, 55 (2012). 

24. Federrath, C. & Klessen, R. S. On the star formation efficiency of turbulent 
magnetized clouds. Astrophys. J. 763, 51 (2013). 

25. Tremblin, P., Audit, E., Minier, V., Schmidt, W. & Schneider, N. Three- 
dimensional simulations of globule and pillar formation around H II regions: 
turbulence and shock curvature. Astron. Astrophys. 546, A33 (2012). 

26. Gorti, U. & Hollenbach, D. J. Photoevaporation of clumps in photodissociation 
regions. Astrophys. J. 573, 215-237 (2002). 

27. Elmegreen, B. G. & Lada, C. J. Sequential formation of subgroups in 
OB associations. Astrophys. J. 214, 725-741 (1977). 

28. Berné, O., Marcelino, N. & Cernicharo, J. Waves on the surface of the Orion 
molecular cloud. Nature 466, 947-949 (2010). 

29. Garcia-Segura, G. & Franco, J. From ultracompact to extended H II regions. 
Astrophys. J. 469, 171-188 (1996). 

30. Lefloch, B. & Lazareff, B. Cometary globules. 1: formation, evolution and 
morphology. Astron. Astrophys. 289, 559-578 (1994). 


Acknowledgements We thank the ERC for support under grant ERC-2013- 
Syg-610256-NANOCOSMOS. We also thank MINECO, Spain, for funding 
support under grants CSD2009-00038 and AYA2012-32032. This work was 

in part supported by the French CNRS programme ‘Physique et Chimie du 
Milieu Interstellaire’. We thank P. Schilke and D. Lis for sharing their IRAM-PdBI 
observations of the H!3CN J= 1-0 condensations inside the Orion Bar, and 

M. Walmsley for sharing his H2 v= 1-0 S(1) and O11 1.3 jum infrared images. 
ALMA is a partnership of the ESO (representing its member states), the NSF 
(USA) and NINS JJapan), together with the NRC (Canada), the NSC and ASIAA 
(Taiwan) and KASI (Republic of Korea) in cooperation with the Republic of 
Chile. The Joint ALMA Observatory is operated by the ESO, the AUI/NRAO and 
the NAOJ. This Letter makes use of observations obtained with the IRAM 30 m 
telescope. IRAM is supported by the INSU/CNRS (France), the MPG (Germany), 
and the IGN (Spain). 


Author Contributions J.R.G. was the principal investigator of the ALMA project. 
He led the scientific analysis, modelling and wrote the manuscript. J.P. and 
E.C. carried out the ALMA data calibration and data reduction. S.C. and N.M. 
carried out the single-dish maps observations with the IRAM 30 m telescope. 
All authors participated in the discussion of results, determination of the 
conclusions and revision of the manuscript. 


Author Information We used the ALMA data ADS/JAO.ALMA#2012.1.00352.S 
available at https://almascience.eso.org/aq/?project_code=2012.1.00352.S. 
Reprints and permissions information is available at www.nature.com/reprints. 
The authors declare no competing financial interests. Readers are welcome to 
comment on the online version of the paper. Correspondence and requests for 
materials should be addressed to J.R.G. (jr.goicoechea@icmm.csic.es). 


Reviewer Information Nature thanks R. Plume and the other anonymous 
reviewer(s) for their contribution to the peer review of this work. 


8 SEPTEMBER 2016 | VOL 537 | NATURE | 209 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


METHODS 

ALMA interferometric and IRAM 30-m single-dish observations. ALMA 
Cycle-1 observations of the Orion Bar were carried out using 27 12 m antennae 
in band 7 at 345.796 GHz (CO J= 3-2) and 356.734 GHz (HCO* J= 4-3). The 
observations consisted of a 27-pointing (the array points to 27 different positions 
to cover the field) mosaic centred at right ascension a(2000) =5h 35 min 20.6 s; 
declination 6(2000) = —05° 25’ 20’. The total field-of-view is 58” x 52”. Baseline 
configurations from about 12m to about 444m were used (C32-3 antennae 
configuration). Lines were observed with correlators providing a resolution of 
approximately 500 kHz (§v~0.4km s_') over a 937.5 MHz bandwidth. The total 
observation time on the ALMA 12 m array was around 2h. ALMA executing blocks 
were first calibrated in the CASA software (version 4.2.0) and then exported to 
GILDAS. To recover the large-scale extended emission filtered out by the inter- 
ferometer, we used fully sampled single-dish maps as ‘zero-’ and ‘short-spacings. 
Maps were obtained with the IRAM 30 m telescope (Pico Veleta, Spain) using the 
EMIR330 receiver under excellent winter conditions (<1 mm of precipitable water 
vapour). On-the-fly scans of a 170” x 170” region were obtained both along and 
perpendicular to the Orion Bar. The beam full-width at half-maximum power 
(FWHM) at 350 GHz is about 7”. The GILDAS/MAPPING software was used 
to create the short-spacing visibilities*' not sampled by ALMA. These visibilities 
were merged with the interferometric observations. Each mosaic field was imaged 
and a dirty mosaic was built. The dirty image was deconvolved using the standard 
Hégbom CLEAN algorithm and the resulting cubes were scaled from Jansky per 
beam to a brightness temperature scale using the synthesized beam size of about 1”. 
This resolution is a factor of approximately 9 higher than previous interferometric 
observations of the HCO* J= 1-0 line towards the Orion Bar®. The achieved root 
mean squared noise is about 0.4K per 0.4km s_! channel, with an absolute flux 
accuracy of about 10%. The resulting images are shown in Figs 1b and 2 and in 
Extended Data Fig. 2. Finally, the large-scale HCO* J = 3-2 (267.558 GHz) on-the- 
fly map shown in Fig. 1a was taken with the multi-beam receiver HERA, also at 
the IRAM 30m telescope. The spectral and angular resolutions are approximately 
0.4km s~! and 9” (FWHM) respectively. The final images were generated using 
the GILDAS/GREG software. 

Saturation and extinction corrections for the near-infrared image. To better 
understand the spatial distribution of the H, v= 1-0 S(1) line emission at 
A=2.12.m (H;) presented in ref. 1 and shown in Extended Data Fig. 2, we note 
two effects that determine the resulting emission morphology. First, there is a 
bright star in the line of sight towards the Orion Bar (©”AOri at a(2000) = 
5h 35 m 22.9 s; 6(2000) = —05° 24! 57.8”) that saturates the near-infrared detectors 
in a slit of width approximately 4” parallel to the Orion Bar (roughly between 
§x=19" and 23” in our rotated images). Hence, no H, data are shown in this range. 
Therefore, the layers with H2 vibrational emission are wider that suggested by 
Extended Data Fig. 2, and more H, emission peaks may coincide with HCO™ peaks 
in the blanked 6x = 19’-23” region. Older, near-infrared images with lower angu- 
lar and spectral resolutions do show* that the H;, emission extends out to 6x20". 
Second, dust extinction (due to foreground dust in Orion's Veil and also due to 
dust in the Orion Bar itself) may affect the apparent morphology of the near- 
infrared images. Such effects are often neglected!***? and are not included in 
Extended Data Fig. 2. The extinction towards the Orion Bar produced by the Veil 
is not greater than about 2 mag (ref. 34). Adopting a dust reddening appropriate 
to Orion!!*5, Ry = Ay/E(B— V) =5.5, and the Ax/Ay (where Band V stand for the 
blue and visible photometric filters at 4,400 and 5,500 A respectively, Ay and Ax 
are the extinctions in the visible and in the K filter at 2.2 1m, E(B— V)= Ap — Ay 
is the reddening factor, and Ry is a dimensionless parameter that characterizes the 
slope of the extinction curve) value in ref. 35, we estimate that the H, emission 
lines would only be approximately 30% brighter if foreground extinction correc- 
tions are taken into account. An additional magnitude of extinction due to dust in 
the atomic layer of the Bar itself results in a line intensity increase of about 50%. 
Therefore, minor morphological differences between the near-infrared and 
millimetre-wave images could reflect a small-scale or patchy extinction differences 
in the region’. 

Excitation and radiative transfer models for CO and HCO’. To estimate the 
physical conditions of the HCO*-emitting gas near the dissociation front we run 
a grid of nonlocal, non-local thermodynamic equilibrium excitation and radi- 
ative transfer (Monte Carlo) models. This approach allows us to explore differ- 
ent column densities, gas temperatures and densities. Compared with most PDR 
models (using local escape probability approximations) our models take radiative 
pumping, line trapping and opacity broadening into account. This allows for the 
treatment of optically thick lines (see the appendix in ref. 36 for code details and 
benchmarking tests). Our models use the most recent inelastic collisional rates of 
HCO? with H) and with electrons, and of CO with both H; and H. The electron 
density, m., is an important factor in the collisional excitation of molecular cations 
in a far-ultraviolet-illuminated gas. For HCO’, collisions with electrons start to 


contribute above n. > 10cm™~? (or ny > 10°cm~? if most of the electrons are pro- 
vided by carbon atom ionization). In PDRs, collisions of molecules with H atoms 
can also contribute because the molecular gas fraction, f= 2n(H2)/my= 2n(H2)/ 
[n(H)+2n(H2)], is not 1 (a fully molecular gas). We adopted f= 0.8 and varied 
x- between 0 and 10-*. The H) ortho-to-para ratio was computed for each gas 
temperature T. Radiative excitation by the cosmic microwave background 
(Toms = 2.7 K) and by the far-infrared dust continuum in the Orion Bar?” 
(simulated by optically thin thermal emission at Tyyst = 55 K) were also included. 

Column densities of NC(HCO*) = (5+1) x 10 cm>? and N(CO) = (1.0+0.5) x 
10'Scm~* were estimated using information from our IRAM 30-m telescope 
line-survey towards the dissociation front*®. Several HCOt, H¥COt, HC#Ot 
and C180 rotational lines were included in the estimation (the quoted dispersions 
in the column densities reflect the uncertainty obtained from least square fits to 
rotational population diagrams). They are consistent with previous observations 
in the region®®. Radiative transfer models were run for N(HCOt+)=5 x 10° cm™?, 
N(CO)=1.0 x 10'8cm~?, and Ny= N(H) +2N(H2) 2 x 10”? cm~? (equivalent 
to Ay +7 mag for the dust properties in Orion). This results in x(HCO*) = 
(2-3) x 10~? and x(CO) & (2.5-7.5) x 10-° abundances. In addition, the HCO*/ 
HCO* column density ratios derived from single-dish observations are 
similar to the 2C/°C = 67 isotopic ratio in Orion*’. Thus, the H'*CO?* lines 
are not very opaque (Thine © 2) otherwise the observed HCO*+/H#CO? line 
intensity ratios would be considerably smaller. A non-thermal (turbulent) velocity 
dispersion (oytn) of about 1 km s~! reproduces the observed line widths. A similar 
value, 1.0-1.5kms"}, is inferred directly from the observed line profiles (72,, = 
O2bs — o(T)> with Avgwum=2V/2In2 X oops ¥3.0£0.5kms~! and T=300K). 
Hence, opacity broadening plays a minor role. The dispersion opp is similar or 
lower than the local speed of sound at T= 100-300 K (cppr = (kgI'/m)!/? = 
1.0-1.7km s~!, where m is the mean mass per particle and kg is the Boltzmann 
constant). This results in moderate Mach numbers M = oyin/cppr < 1. 

Extended Poe Fig. 3 shows model predictions for the CO J= 3-2 line intensity 
peak, Teak (upper left panel), and HCO* J= 4-3 line integrated intensity, 


waco = =f Tgdv (where Ty is the line brightness temperature) (Kkms7!), for 


CO3-2 
peak 


a good measure of the excitation temperature, with Teak ¥ J(Tex) = 


different T and ny values. For optically thick lines (Thine >> 1), T provides 


Eup/kg x 


-1 
(where T,, is the excitation temperature of the transition and 


Eup is the upper level energy). In addition, for low-critical-density (n,;) transitions 
such as the low-J CO transitions, the lines are close to thermalization at densities 
above about 104cm~°, thus T., > T (with ng, = Aj/yi, where Aj is the Einstein 
coefficient for spontaneous emission and 7; is the coefficient of the collisional de-ex- 
citation rate). In this case, ‘sien oa isa good thermometer of the Tco 3-2> 1 emitting 
layers. The HCOt J=4- 3 line, however, has much higher critical densities 
(Meru, >5 X 10%cm~? and Me. 10° e om.) For ny <2nerH,/Tline (sub-thermal 
excitation), the integrated line intensity WSO. is approximately linearly propor- 
tional to N(HCO*) =x(HCO*)nyl (where / is the cloud length along the line of 
sight) even if the line is moderately thick. PDR models®’ and CO observations 
respectively show that x(HCO*) and T do not change substantially in the PDR 
layers around the H; emission peaks (cloud depths between Ay + 1 and 2 mag). 
Ina nearly edge-on PDR, the spatial length along the line of sight does not change 
greatly either. We compute that i the inferred Tand N (HCO*) values in the een, 

the integrated line intensity WC° is proportional to the density in the m= 10*- 
10° cm”? range (the correlation coefficient is r~ 0.98 for models with x.=0 and 
Xe= 107‘). Moreover, Wi still increases with a density of up to several 10°cm™3 
(r=0.94). This reasoning justifies the use of Wf asa proxy for my in the region. 
Average physical conditions in the compressed structures. The physical condi- 
tions that reproduce the mean CO J = 3-2 line peak and HCO* J=4-3 integrated 
line intensity towards the compressed structures at 8x72 15" (TCO? = 164+ 10K 


peak 
and waco" =69+18Kkms~!) are T=200-300K and ny=(1.0+0.5) x 
10° cm~? (Extended Data Fig. 3). This implies high thermal pressures, 
Pihcomp/k = nT % (1.0-4.5) x 10°K cm~? (where Pin,comp is the pressure in the 


compressed gas component). The brightest HCO* emission peaks (with 


wico = 100 K km s“1, Fig. 2a) probably correspond to specific gas density 


enhancements. For the range of column densities and physical conditions at 
&x~ 15”, the gas temperature uncertainty is determined by the lack of higher-J CO 
lines, observed at high angular resolution, to better constrain T from excitation 
models. The range of estimated gas densities is dominated by the dispersion (about 
25%) of the mean WC? value. 

The above physical conditions suggest that the cloud edge contains substruc- 
tures that are denser than the atomic layer** (m= (4-5) x 10cm) and denser 
than the ambient molecular cloud? (1;=(0.5-1.0) x 10°cm~*). The equivalent 
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length of the substructures is small, = Ny/11 (4-12) x 1073 pc (where Ny is the 
total column density of hydrogen nuclei along the line of sight) (about 2’"-6” at the 
distance to Orion, thus consistent with their apparent size in the ALMA image). 
The mass of a cylinder with ny of a few 10°cm~°, 2-6" length and width of 2” is 
S0.005Mo (that is, a mass per unit length of (0.3-1.0)Mo pe '). This is much lower 
than the virial and critical masses“? needed to make them gravitationally unstable 
(approximately 5Mo, from the inferred gas temperature, density and velocity 
dispersion). H2 clumps of similar small masses (several 0.001Mo) have been intu- 
ited towards the boundary of more evolved and distant H 11 regions*!. Compression 
and fragmentation of ultraviolet-irradiated cloud edges must be a common 
phenomenon in the vicinity of young massive stars. 

Physical conditions in the ambient molecular cloud. Deeper inside the molec- 
ular cloud, so smoothly decreases from about 170K to about 130K. Therefore, 
these observations do not suggest temperature spikes at scales of a few arcseconds. 
Deeper inside the molecular cloud (4x > 30” in our rotated images), both N(H) 
and N(HCO?) are expected to gradually increase”, For the expected 
N(HCO*) 2 x 10!4cm~? column density”, excitation models show that the gas 
density in the ambient cloud is my (0.5-1.0) x 10° cm~3 (dashed curves in 
Extended Data Fig. 3), in agreement with previous estimations”». Hence, the over- 
dense substructures have compression factors of approximately 5-30 with respect 
to the ambient molecular gas. 2 
Physical conditions in the atomic layer. The decrease of both TCO}? and WE? 
between the ionization and dissociation fronts is consistent with the expected sharp 
decrease of CO and HCO* abundances in the atomic layer. The representative gas 
density in the atomic layer, my (4-5) x 10cm“, is constrained by the strength 
of the unattenuated far-ultraviolet flux at the Bar edge*® (y ~4.4 x 104, determined 
by the spectral type of the Trapezium stars) and by the current position of the 
dissociation front at 6x ~ 15” (refs 1 and 33). The exact gas density value, however, 
depends on the assumed far-ultraviolet-extinction grain properties (which prob- 
ably vary as function of cloud depth). In the context of stationary PDR models, 
larger-than-standard-size grains (lower far-ultraviolet absorption cross-sections) 
are often invoked, otherwise the separation between the dissociation and ioni- 
zation fronts would be smaller than the observed around 15”. The lower densities 
in the atomic layer agree with the observed low H2 v= 1-0 S(1)/v=2-1 S(1) 3 
line intensity ratio attributed to fluorescent H, excitation*?“*, We note that optically 
thin CO emission implies eae < T.. Hence, ied can no longer be used 
as a gas thermometer in the atomic layer where the CO abundance is low. The 
gas temperature close to the dissociation front is between T~ 500 K (from H1 
observations!+) and T= 300K (from carbon radiorecombination® and [C 1] 
158 \.m (ref. 11) line observations). 

Emission probability distribution functions (PDF). To study the distribution of 
gas densities in the region, approximated by the HCO* J = 4-3 emission, we ana- 
lysed the probability distribution of the logarithmic emission, given by 


z= | wioo" / ( wig’) , where ( wee = f Thay) is the mean value in the 


observed field-of-view (37 K km s~!). This is a common approach used to 
interpret (column) density maps, both from observations and MHD simula- 


tions***, The PDF is computed as the number of pixels (in the high signal- 
7 


to-noise WiC? image) per intensity bin divided by the total number of pixels. We 


first analysed the complete field-of-view observed by ALMA and selected waco" 


measurements above 5c, where we define o = rms(26vAvgwym)!2, with 
6v=0.4km s-! and Avewum=3.0km s_!. The resulting PDF is shown in Fig. 2d 
(magenta points). Second, we selected measurements only in the compressed 
layers region between x= 7” and 30” (with respect to the rotated images in Fig. 2). 
The resulting PDF (black points) is very close to a log-normal distribution with 


p(z) = Nexp(—(z — zo)?/207), where Zo is the peak value and o the standard 


+ 
deviation. We obtain z)=0.165 and o=0.31 from a fit (green curve). If WES? 


is proportional to the gas density, these values imply that 99% of the observed 
positions in the compressed layers span a factor of about 6 in density. In MHD 
models, o is a measure of how density varies in a turbulent cloud. Hence, it depends 
on the Mach number, the ratio of the thermal to magnetic pressure (3) and the 
forcing characteristics of the turbulence”‘. The relatively modest o value inferred 
in the 6x =7"-30" layer is consistent with the low Mach numbers in the PDR, and 
suggests an important role of magnetic pressure. We note that a similar analysis of 
the CO emission does not yield the same log-normal distribution. This is consist- 
ent with low-J CO lines being optically thick and tracing gas temperatures rather 
than gas density variations. This reinforces that the log-normal shape of the WES? 
PDF in the compressed layer is a relevant observational result. 

Gas pressures, magnetic field and compression. To support the cloud compres- 
sion and gas photoablation scenario, we investigated the different contributions to 
the gas pressure in the region. The thermal pressure in the H 1 region near the 
ionization front! is Piyyy n/k=2neIo 6 X 10’K cm, about six times higher than 
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the turbulent ram pressure Pyam,amb = PT nth,amb in the ambient molecular cloud 
(Extended Data Table 1). As we find similar contributions from the thermal and 
non-thermal (turbulent) pressures in both the ambient cloud and the over-dense 
substructures (@ = Pnihamb/Pth,amb © Path,comp/Pth,comp * 1), it is reasonable to 
assume equipartition of thermal, turbulent and magnetic energies to quantify the 
magnetic pressure in the PDR (Pg=B?/8n). In particular, for 0= Ps/Pt,= 1 we 
estimate the magnetic field strengths B to be 200|1G and 800,1G in the ambient 
and in the high-density substructures, respectively. Such strong magnetic fields at 
small scales need to be confirmed observationally (both the strength and the ori- 
entation) but seem consistent with the high values (approximately 100 1G) meas- 
ured in the low-density foreground material* (the Orion Veil) confirming that B 
is particularly strong in the Orion complex. On much larger spatial scales, low- 
angular-resolution observations do suggest that B increases with density at 
H u1/cloud boundaries (Bx ny°*~') (ref. 46). 

A strong magnetic field would be associated with large magnetosonic speeds 

(Vms) in the PDR. If an ultraviolet radiation-driven shockwave is responsible for 
the molecular gas compression, its velocity is predicted to slow down to 
vse 3kms~! once it enters the molecular cloud?!. In sucha slow, magnetized shock 
(Vs<Vms)» Compression waves can travel ahead of the shock front”. Thus, a high 
magnetic field strength may be related to the W#°° undulations seen perpendic- 
ular to the Orion Bar (Fig. 2c). The inferred compression factor in the observed 
substructures (f= Ncomp/Mamb = 5-30) is consistent with slow shock velocities!®, 
Vs= Cy f°° = 1.5-4.0km s“!, where cp is the initial sound speed of the unperturbed 
molecular gas. The necessarily small v, agrees with the relatively narrow molecular 
line-profiles (Avpwum <4km_!) seen in PDRs" (including observations of face-on 
sources in which the shock would propagate in the line of sight). Owing to the high 
thermal pressure in the compressed structures, we also find that a pressure gradi- 
ent, with Pihcomp = Pinu exists. This subtle effect is seen in simulations of an 
advancing shockwave around an H 11 region?™“*. 
Molecular gas between the ionization and dissociation fronts. ALMA reveals 
fainter HCO* and CO emission in the atomic layer (HCO* globulettes and 
plume-like CO features at x < 15”, Fig. 2). Previous low-angular-resolution 
observations and models had suggested the presence of dense spherical clumps 
with sizes of 5”—10” deeper inside the molecular cloud? (at > 15”-20” from 
the ionization front***?). The dense substructures resolved by ALMA are smaller 
(~2" x 4”) and are detected at 6x > 7” (even before the peak of the H2 vibrational 
emission). 

The molecular line profiles towards the plumes typically show two velocity emis- 

sion components (Extended Data Fig. 4): one centred at vjsp8.5kms~! (where 
visr refers to the emission velocity with respect to the local standard of rest), the 
velocity of the background molecular cloud in the back-side of M 42 (ref. 11; not 
directly associated with the Orion Bar), and other at visap 11 kms“, the velocity 
component of the molecular gas in the Orion Bar. In addition, despite the small 
size of the observed region, the crosscuts of the HCO* J= 4-3 line velocity centroid 
and of the FWHM velocity dispersion show gradients perpendicular to the Orion 
Bar (Extended Data Fig. 4). Moving from the ionization front to the molecular gas, 
the line centroid shifts to higher velocities (gas compression effects may, in part, 
contribute to this redshifted velocity). The velocity dispersion, however, shows 
its maximum between the ionization and the dissociation fronts, the expected 
layers for photoablative neutral gas flows. Both the kinematic association with the 
Orion Bar velocities and the higher velocity dispersion between the two fronts are 
consistent with the presence of gas flowing from the high-pressure compressed 
molecular layers (Pih,comp/k 2 X 10°K cm~*) to the atomic layers (Pthatomic/k © 
5x 10’Kcm"?). 
HCO* chemistry and the Ct/CO transition zone. Static equilibrium PDR 
models® appropriate to the ambient gas component (m5 x 104cm~°) reproduce 
the separation between the ionization and dissociation fronts. However, they 
predict HCO* abundances near the dissociation front that are too low (x(HCO*) 
of a few 5 x 10-1") to be consistent with the bright ridge of HCOT emission 
detected by ALMA. These models also predict that the Ct/CO transition should 
occur ahead of the H/H) transition zone and deeper inside the molecular cloud 
(at x20" from the ionization front®*). However, our detection of bright CO and 
HCO? emission towards the layers of bright H, vibrational emission! implies that 
the C‘/CO transition occurs closer to the cloud edge, and nearly coincides with the 
H/H, transition (at least it cannot be resolved at the approximately 1” resolution of 
our observations). This is probably another signature of dynamical effects. Indeed, 
the presence of molecular gas near the cloud edge”’, and a reduced C* abundance 
deeper inside the molecular cloud*°, may explain model and observation discrep- 
ancies of other chemically related molecules. 

As an example, stationary PDR models applied to the fluorine chemistry*! 
overestimate the CF* column density observed towards the Orion Bar™ by a 
factor of about 10. Given that HF readily forms as F atoms react with H mole- 
cules, CFt must arise from layers where C* and H) overlap (CF* forms through 
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HF + C* — CE* + H reactions and is quickly destroyed by recombination 
with electrons)*+°?. Hence, the (lower-than-predicted) observed CF* abundances 
probably reflect a dynamical PDR behaviour as well. 

Stationary PDR models of strongly irradiated dense gas (with ny values of a few 
10°cm 3) have been presented in the literature*”. The above densities are similar 
to those inferred in the compressed substructures at the Orion Bar edge. Thus they 
can be used to gain insight into the chemistry that leads to the formation of HCOT 
and CO in ultraviolet-irradiated dense gas. Owing to the higher densities and 
enhanced H; collisional de-excitation heating, the gas attains high temperatures. 
This triggers a warm chemistry in which endothermic reactions and reactions 
with energy barriers become faster. As a result, higher HCO* abundances are 
predicted close to the dissociation front (x(HCO*) of several 10~°). Reactions 
of C* with H; (either far-ultraviolet-pumped or thermally excited) initiate 
the carbon chemistry™. This reaction triggers the formation of CH* (explaining 
the elevated CH abundances detected by Herschel**) and reduces the abundance 
of Ct ions and H; molecules near the dissociation front; that is, the H/H and 
the C*/CO transition layers naturally get closer (in Ay)”. Fast exothermic reactions 
of CH*™ with H subsequently produce CH,* and CH;°*. Both hydrocarbon ions 
are ‘burnt’ in reactions with abundant oxygen atoms and contribute to the 
formation of HCO* at the molecular cloud edge. This HCO* formation 
route from CH* can dominate over the formation of HCO* from CO* (after 
the O + H, — OH + H reaction, followed by Ct + OH — CO? + H, and 
finally CO* + H, + HCO* + H)***?, Both OH and CO* have been detected 
in the Orion Bar*®°’, but high-angular-resolution maps do not exist. 
Recombination of HCO* with electrons then drives CO production near the 
dissociation front®”. 

Extrapolating the above chemical scenario, the brightest HCO* J=4-3 
emission peaks in the Orion Bar should be close to H; emission peaks. Extended 
Data Fig. 2a shows a remarkable spatial agreement between the H, v= 1-0 S(1) 
emission peaks and several HCO* emission peaks. Detailed H excitation models 
(including both far-ultraviolet-pumping and collisions) show that for the 
conditions prevailing in the Orion Bar, the intensity of the Hz v= 1-0 S(1) line is 
approximately proportional to the gas density*”. Therefore, the HCO* peaks that 
match the position of the H, v= 1-0 S(1) line peaks probably correspond to gas 
density enhancements as well. This agrees with the higher Hz v= 1-0 S(1)/v=2-1 
S(1) <8 line intensity ratios observed at the dissociation front and consistent 
with efficient H collisional excitation*”. The ALMA images thus confirm that in 
addition, or as a consequence of dynamical effects, reactions of H2 with abundant 
atoms and ions contribute to shift the molecular gas production towards the 
cloud edge. Even higher-angular-resolution observations of additional tracers will 
be needed to fully understand this, and to spatially resolve the chemical stratifi- 
cation expected in the over-dense substructures themselves. We note that if 
most of the carbon becomes CO at Ay ¥ 2 (Ny of several 10?! cm~?) in substruc- 
tures with gas densities of a few 10°cm *, this depth is equivalent to a 
spatial length of several 10!° cm, or an angular-scale of about 0.5” at the distance 
to Orion. 

Deeper inside into the molecular cloud (5x > 30”), the CO*, CH*, CH2* and 
CH;* abundances sharply decrease. The far-ultraviolet flux greatly diminishes, and 
the gas and dust grain temperatures accordingly decrease. The HCO* abundance 
also decreases until the CO + H3* — HCO* + H reaction starts to drive the 
HCO* formation at low temperatures. Gas-phase atoms and molecules gradually 
deplete and dust grains become coated by ices as the far-ultraviolet photon flux is 
attenuated at even larger cloud depths (see Extended Data Fig. 1). 
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Extended Data Figure 1 | Structure of a strongly ultraviolet-irradiated 


molecular cloud edge. The incident stellar ultraviolet radiation comes 
from the left. The velocity of the advancing ionization and dissociation 
fronts are represented by vz and vp, respectively. In the Orion Bar, the 


dissociation front is at about 15” (about 0.03 pc) from the ionization front. 
UV, ultraviolet; PAH, polycyclic aromatic hydrocarbons. The snow line 
refers to the inner cloud layers where molecular gases start to freeze and 
dust grains become coated by ices. 
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Extended Data Figure 2 | Comparison with other tracers. a, ALMA brightest regions of H; v= 1-0 S(1) emission! (from 1.5 to 4.5 in steps 
HCO* J=4-3 line integrated intensity. b, ALMA CO J = 3-2 line peak of 0.5-10~4 erg s-! cm? sr~!). The H, image is saturated between 6x = 19” 
(Orion Bar velocity component). The red contours represent the H'3CN and 23” (that is, no data are shown). Figures have been rotated 127.5° 


J =1-0 emission (from 0.08 to 0.026 in steps of 0.02 Jy beam~!km s~!) of 


anticlockwise to bring the incident ultraviolet radiation from the left. 
dense condensations inside the Orion Bar!’. The black contours show the 
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Extended Data Figure 4 | Line velocity centroid, dispersion and 
profiles. a, Vertically averaged cuts perpendicular to the Orion Bar in the 
HCO* J=4-3 line velocity centroid (magenta curve) and FWHM velocity 
dispersion (grey curve). b, CO and HCO? spectra at representative 
positions. The top and middle plots show positions between the ionization 
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and dissociation fronts, the bottom plot is inside the molecular Orion Bar. 
Offsets are given with respect to the rotated images in Extended Data Fig. 2. 
The velocity of the background cloud is vysr¥ 8.5 km s~! (black dashed 
line), whereas the velocity of the Orion Bar is vjsp 11 km s~! (green line). 
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Extended Data Table 1 | Gas pressures and estimated magnetic field strengths 
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All values are for a non-thermal velocity dispersion of onthe 1kms7?. 
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Massive radius-dependent flow slippage in carbon 


nanotubes 


Eleonora Secchi!, Sophie Marbach!, Antoine Nigués!, Derek Stein!?, Alessandro Siria! & Lydéric Bocquet! 


Measurements and simulations have found that water moves 
through carbon nanotubes at exceptionally high rates owing to 
nearly frictionless interfaces'~*. These observations have stimulated 
interest in nanotube-based membranes for applications including 
desalination, nano-filtration and energy harvesting>'°, yet the 
exact mechanisms of water transport inside the nanotubes and 
at the water-carbon interface continue to be debated’? because 
existing theories do not provide a satisfactory explanation for the 
limited number of experimental results available so far’’. This lack 
of experimental results arises because, even though controlled and 
systematic studies have explored transport through individual 
nanotubes’~”!*-!”, none has met the considerable technical 
challenge of unambiguously measuring the permeability of a single 
nanotube'!. Here we show that the pressure-driven flow rate through 
individual nanotubes can be determined with unprecedented 
sensitivity and without dyes from the hydrodynamics of water jets 
as they emerge from single nanotubes into a surrounding fluid. Our 
measurements reveal unexpectedly large and radius-dependent 
surface slippage in carbon nanotubes, and no slippage in boron 
nitride nanotubes that are crystallographically similar to carbon 
nanotubes, but electronically different. This pronounced contrast 
between the two systems must originate from subtle differences in 


the atomic-scale details of their solid-liquid interfaces, illustrating 
that nanofluidics is the frontier at which the continuum picture of 
fluid mechanics meets the atomic nature of matter. 

Measuring the pressure-driven flow of water through individual 
carbon nanotubes (CNTs) and boron nitride nanotubes (BNNTs) 
with well-defined radii (R,) and lengths (L;) requires overcoming two 
considerable challenges. First, when R; decreases to the nanoscale, the 
flow rate through a tube drops too rapidly for even state-of-the-art flow- 
rate measurements to detect. Flow rates as low as a few picolitres per 
second have been measured through single nanocapillaries'®, but such 
a rate is still about three orders of magnitude higher than the sensitivity 
required to probe mass flow through a single nanotube. Our approach 
avoids this problem by focusing instead on the flow that a fluid jet 
entrains outside a nanotube (see Fig. 1) and on the scaling property of 
the jet hydrodynamics”. The external flow is characterized by a driving 
force Fp that originates in the fluid momentum transfer at the tube 
opening”®”! and scales linearly with R,, so the flow velocities remain 
measurably large even when R; shrinks to nanometre-scale dimensions. 
The second challenge is fabricating an experimental system for manip- 
ulating and using a single nanotube, in the form ofa nanofluidic needle 
with a single nanotube protruding from the tip. To do this, we adapted 
a technique for selecting and manipulating nanotubes of known length 


Figure 1 | Nanojet experimental set-up. a, SEM image of a CNT insertion 
into a nanocapillary (top) and after sealing (bottom). The CNT has 
dimensions of (R;, L,) =(50nm, 1,000 nm). b, Sketch of the fluidic cell 
used to image the Landau-Squire flow set-up by nanojets emerging 

from individual nanotubes. Red arrows represent the Landau-Squire flow 
in the reservoir; orange spheres are tracer particles; z is the optical axis. 

c, Left, sketch of a nanotube protruding from a nanocapillary tip. The flow 


of water molecules emerging from the nanotube is probed by the tracer 
particles. Right, trajectories of individual colloidal tracers in a Landau- 
Squire flow field in the outer reservoir. The colour scale quantifies the 
velocity v of the tracer particles. The flow is driven by a nanojet from a 
CNT with dimensions of (R,, L,) = (33 nm, 900 nm), with AP = 1.7 bar. 
Both reservoirs contained water with 10-*M KCl. 
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and diameter with a nanomanipulator operating inside a scanning 
electron microscope (SEM)?; see Supplementary Methods 1 and 
Supplementary Video 1. We guided a nanotube into the tip of a laser- 
pulled glass nanocapillary with an orifice in the range 250-350 nm. 
The dimensions of the nanotubes were determined by ionic transport 
measurements and by electron microscopy (see Supplementary 
Methods 2 and 4). For this study we tested five different CNTs with 
dimensions (in nanometres) of (R,, L,) = (15, 700), (17, 450), (33, 900), 
(38, 800) and (50, 1,000), and three different BNNTs with dimensions 
(in nanometres) of (R,, L;) = (23, 600), (26, 700) and (7, 1,300); 
see Supplementary Methods 2 and 4 and Supplementary Table 1. 

The nanotube at the tip of the glass capillary bridged two macro- 
scopic fluid reservoirs: one inside the capillary and another in the wide, 
transparent flow cell into which the capillary was placed (see Fig. 1b and 
Supplementary Methods 3). We filled both reservoirs with potassium 
chloride (KCl) solutions of a chosen concentration C, and controlled 
pH, and seeded the flow cell with 500-nm polystyrene tracer particles. 
We then applied a pressure drop AP to the capillary and tracked the 
resulting motion of the tracers under a microscope (see Fig. 1b) to map 
the velocity profile of the flow (see Figs 1c and 2). Flow measurements 
were performed with salt concentration C,=10~*M or C.=10-7M. 
Low salinity is required during the tracking experiments to prevent 
salt-induced colloid aggregation. 

Ag/AgCl electrodes inserted into either reservoir were used to 
measure the ionic conductance across the nanotube before and after 
each fluidic experiment to ensure the integrity of the device, as well as 
to obtain information on the dimensions and the surface charge density 


v (um s7?) 


v (um s“) 


Fp/(4nn) (um? s~) 


Figure 2 | Measurement of Landau-Squire flows driven from 
nanotubes. a, Maps of the velocity field near a CNT with 

(R, Lt) = (33 nm, 900 nm) for various AP as indicated (C,= 10~? M and 
pH 6). b, Magnitude of mean particle velocity v as a function of 

r'(0) = 2r/.J1 + 3cos?6 for AP=0.5 bar (black), AP = 1 bar (blue) and 
AP= 1.5 bar (red). Dashed lines are fits of the Landau-Squire prediction. 
Inset, particle velocity along the jet axis (0 = 0) versus distance from the 
nanotube, for AP = 0.75 bar (green) and AP = 1.7 bar (orange); the 
dashed line is a 1/r fit. c, Dependence of #2 on AP for CNTs (green 


4 
circles) and BNNTs (blue triangles). CNT dimensions (in nanometres) are, 
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of the nanotube (see Supplementary Methods 4). These electrodes were 
grounded during flow measurements. 

Owing to the needle geometry of the system, the pressure-driven flow 
through the nanotube sets up a flow in the outer reservoir called a 
Landau-Squire nanojet'®”°!, The Landau-Squire solution of the Navier- 
Stokes equations at low Reynolds number predicts radial and angular 


Fp cos0 Fp sind 


nd vg=— 
4mn or a v9 8un OF 


respectively, where r is the radial distance from the tip, 0 is the angle 
relative to the symmetry axis of the jet and 77 is the viscosity”®. Fp is the 
driving force of the jet applied at the origin. Figure 2a, b shows that our 
measurements of the flow field around single nanotubes agree well with 
the Landau-Squire prediction. The inset of Fig. 2b further highlights the 
long-range 1/r-dependence of the Landau-Squire flow, which extends 
over tens of micrometres despite the nanometre-scale size of the source 
of the flow. 

From our analysis of the Landau-Squire flow, we extracted experi- 
mental values of Fp for each nanotube and AP. The results, presented 
in Fig. 2c, show a linear relationship between Fp and AP. To gain insight 
into the permeability of the nanotubes, we begin by observing that the 
mass flow rate and Fp are both proportional to AP and, hence, 
proportional to one another. The viscous origin of Fp at low Reynolds 
numbers as well as dimensional considerations motivate the definition 
Fp=anRy1 where a = O(1) is a geometry-dependent numerical 
prefactor and vyr is the average fluid velocity inside the nanotube. The 
permeability of the tube kyr is defined by vyr= poe Combining 


these expressions, Fp, ky and AP are related by Fp(AP) = ofeNt AP. 
t 


components of the flow velocity of v,= 


0.04 
4/r() (um) 


AP (bar) 


from top to bottom, (R;, L;) = (50, 1,000), (33, 900), (38, 800), (15, 700) 
and (17, 450); BNNT dimensions (in nanometres) are (R;, L,) = (26, 700) 
and (23, 600). The salt concentration is C,=10~3M, except for the 33-nm 
CNT, which was studied at both C,= 107M and C,= 107M without a 
detectable difference. Dashed green lines are linear fits from which the 
permeability was calculated. The orange line indicates the lowest detectable 
flow strength. The black dashed line corresponds to the results of a control 
experiment using a nanocapillary without a nanotube (see Supplementary 
Methods 5). Error bars correspond to the uncertainty in the slope in b, 
estimated from at least three measurement replicates at each AP. 
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Figure 3 | Permeability and slip length of individual CNTs and BNNTs. 


a, Normalized permeability (kyr/ Ke tip) of CNTs (green) and BNNTs 
(blue) as a function of nanotube radius R;. The permeability of the BNNT 
with R,=7 nm was below the experimental detection limit and is indicated 
as kyr = 0 for completeness. Error bars correspond to the experimental 
errors on Fp. b, Dependence of the experimentally determined slip length 
inside CNTs (green) and BNNTs (blue) on R;. Error bars correspond to 

the uncertainty in the permeability. Salt concentration is C,= 10-*M, 
except for the 33-nm CNT, which was studied at both C,=10~*M and 


C,=10~*M without a detectable difference. Iin both panels, the horizontal 
dashed lines indicate the no-slip prediction (kyr/ di = 1) and the green 
dashed lines are guides to the eye. The error bars on the radius correspond 
to the experimental uncertainty in the electric characteristics (see 
Supplementary Methods 2 and 4). The values of the slip lengths are 


reported in Supplementary Tables 2 and 3. 


Radius (nm) 


According to this equation, the slope of the plots in Fig. 2c provides 
an estimate of the nanotube permeability, so we can already see that the 
permeability of CNTs is greatly enhanced as compared with BNNTs. 
But, to properly quantify the permeabilities, we need to know the value 
of a. We calculated a from the precise relationship between Fp, vyr and 
AP that we obtained by numerically solving the full hydrodynamic 
Landau-Squire flow. Furthermore, because a could be sensitive to 
details of the geometry of the nanotube and the tip, we repeated our 
calculations for every nanotube device, taking in account its particular 
geometry as measured by SEM (see Supplementary Methods 6). This 
exhaustive study, which combines numerical hydrodynamic calculations 
with experimental benchmarking using nanocapillaries, is summarized 
in Supplementary Methods 5. Our study showed that a ~ 0.3 for the 
nanotube devices considered in Fig. 2a, b, with only small variations 
between nanotubes. Having removed all uncertainty from the value 
of a, we obtained accurate values for ky; from the experimental 
dependence of Fp on AP. Figure 3a presents the dependence of kyr on 
R, for every nanotube. The permeabilities are normalized by a simple 
no-slip reference, hain =R /8, corresponding to a nanotube of 
the same size with a no-slip boundary condition at its surface. Note that 
the flow from the smallest BNNT tube, with R, =7 nm, was below the 
detection limit. 

We attribute the enhanced permeability of the CNTs to hydrody- 
namic slippage at the carbon surface'”!?”?, The fundamental way to 
account for this is to introduce a slip length b and to apply Navier’s slip 
boundary condition to the fluid at the nanotube surface. We included 
the slip condition in our numerical analysis of the hydrodynamics of 
each nanotube device and obtained experimental b values by matching 
the computed flow rate enhancement due to surface slippage with the 
measured permeability data in Fig. 3a (see Supplementary Methods 6). 
This analysis, which uses the geometry of each nanotube device and 
takes into account hydrodynamic entrance effects at the nanotube ends, 
offers the most accurate estimation of b possible. The permeability 
and b can also be quantitatively obtained from an analytical model 
of hydrodynamic resistances in series, using the Sampson formula to 
account for both Poiseuille flow with slippage inside the nanotube and 
entrance effects”; see Supplementary Tables 2 and 3. 

The peculiar nature of the water-carbon interface inside CNTs is 
revealed in Fig. 3b, which presents the experimentally determined slip 
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length as a function of R;. A first key observation is that the slip length 
is strongly radius-dependent, reaching 300 nm inside the smallest CNT 
investigated here. This observation allows us to resolve a long-standing 
debate regarding the large difference in permeabilities reported 
previously**”* using large-scale CNT membranes. The results of those 
studies are consistent with a decreasing permeability enhancement 
factor for larger nanotubes, and the range of slip lengths they report is 
fairly compatible with what we have measured. Our results also explain 
why the slip lengths measured previously inside CNTs were consistently 
much larger than the values measured on planar hydrophobic and 
graphite surfaces'?”°, for which b is typically a few tens of nanometres at 
most. From a theoretical perspective, the transport behaviour of water 
inside CNTs has been the subject of numerous studies, mostly using 
molecular dynamics simulations!*!°. Radius-dependent slippage was 
predicted inside CNTs with R, < 10 mn (refs 22, 23) and rationalized 
in terms of curvature-dependent friction”. The results presented here 
confirm the predicted trend, but the measured slip lengths far exceed 
the numerical predictions. This discrepancy suggests that molecular 
dynamics simulations do not represent the interfacial dynamics well at 
a quantitative level, echoing similar limitations encountered in studies 
of slippage at hydrophobic surfaces'. 

A second key feature of Fig. 3c is the vastly different behaviour of 
CNTs and BNNTs, with the latter showing no substantial slippage of 
water. The comparison is illuminating because CNTs and BNNTs have 
the same crystallography, but radically different electronic properties, 
with CNTs being semi-metallic and BNNTs insulating. That these 
nearly identical channels exhibit very different surface flow dynamics 
is unexpected: molecular dynamics simulations using semi-empirical 
interfacial parameters predict similar flow behaviour through CNTs 
and BNNTs?”8. More recent ab initio simulations predict that the 
friction of water on carbon surfaces is lower than on boron nitride 
surfaces”’, but even these predictions strongly underestimate the 
difference observed here. The stark differences in flow behaviour 
must therefore originate in subtle atomic-scale details of the solid- 
liquid interface, including the electronic structure of the confining 
material. A more detailed understanding will require a systematic 
theoretical investigation of physico-chemical factors that could affect 
surface friction, such as chemical surface dissociation or specific ion 
adsorption. Useful information could also be gained by measuring the 
slip behaviour in CNTs at high salt concentrations, a regime in which 
the surface charge of CNTs is expected to increase’. 

The unexpected slippage behaviour inside CNTs and BNNTs points 
to a hitherto not appreciated link between hydrodynamic flow and 
the electronic structure of the confining material. This opens up 
a new avenue for research that could bridge the gap between hard 
and soft condensed matter physics. We also expect that, with further 
improvements in sensitivity, the methods we have developed will enable 
the direct measurement of water transport through biological channels 
such as aquaporins. 
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Oxidative diversification of amino acids and 
peptides by small-molecule iron catalysis 


Thomas J. Osberger!*, Donald C. Rogness!*, Jeffrey T. Kohrt?, Antonia F. Stepan’ & M. Christina White! 


Secondary metabolites synthesized by non-ribosomal peptide 
synthetases display diverse and complex topologies and possess a 
range of biological activities”. Much of this diversity derives from 
a synthetic strategy that entails pre-* and post-assembly” oxidation 
of both the chiral amino acid building blocks and the assembled 
peptide scaffolds. The vancomycin biosynthetic pathway is an 
excellent example of the range of oxidative transformations that 
can be performed by the iron-containing enzymes involved in its 
biosynthesis*. However, because of the challenges associated with 
using such oxidative enzymes to carry out chemical transformations 
in vitro, chemical syntheses guided by these principles have 
not been fully realized in the laboratory®. Here we report that 
two small-molecule iron catalysts are capable of facilitating the 
targeted C-H oxidative modification of amino acids and peptides 
with preservation of a-centre chirality. Oxidation of proline to 
5-hydroxyproline furnishes a versatile intermediate that can be 
transformed to rigid arylated derivatives or flexible linear carboxylic 
acids, alcohols, olefins and amines in both monomer and peptide 
settings. The value of this C-H oxidation strategy is demonstrated 
in its capacity for generating diversity: four ‘chiral poo? amino 
acids are transformed to twenty-one chiral unnatural amino acids 
representing seven distinct functional group arrays; late-stage 
C-H functionalizations of a single proline-containing tripeptide 
furnish eight tripeptides, each having different unnatural amino 
acids. Additionally, a macrocyclic peptide containing a proline 
turn element is transformed via late-stage C-H oxidation to one 
containing a linear unnatural amino acid. 

A synthetic strategy inspired by non-ribosomal peptide synthetases 
(NRPSs) was envisioned wherein a small-molecule-catalyst-mediated 
C-H oxidation of an amino acid in a monomer or peptide generates 
a versatile synthetic intermediate that may be transformed into 
numerous structural and functional group types with retained optical 
purity. Analogous strategies have successfully employed prefunction- 
alized pluripotent building blocks to generate structurally diverse 
compounds®’. Limited examples of C-H oxidations of amino acid 
derivatives are known and of these few have been demonstrated in 
peptides*"|!. Chelate-controlled C-H arylations are positionally 
limited to amino-terminal residues? and stoichiometric C-H hydrox- 
ylation methods suffer from operational difficulty, modest efficiency, 
and have no demonstrated chemoselectivity in peptide settings'®!’. 
A survey of the possible products of C-H oxidation at the side 
chains of the proteinogenic amino acids led us to reason that target- 
ing hydroxylation at C5 of proline would provide an excellent first 
example of our envisioned strategy (Fig. 1c). Oxidation of proline, a 
biomass chemical, to 5-hydroxyproline (5-HP) furnishes an interme- 
diate having a highly synthetically versatile hemiaminal functional 
group that may be transformed to unnatural amino acids (UAAs) 
and UAA-containing peptides. 5-HP and 5-functionalized proline 
derivatives are currently accessed via multistep synthetic routes from 


pre-functionalized glutamic acid or pyroglutamic acid derivatives’. 
Recently, methods have been developed to furnish a-aryl pyrroli- 
dines via iron salts’? or photoredox catalysts'* and chiral «-nitrile 
pyrrolidines via biocatalysis'°. These a-amine functionalization 
methods generally proceed via generation of positively charged nitro- 
gen via quaternization or amino radical cation formation followed by 
decarboxylation, deprotonation or abstraction of the «-hydrogen of the 
homolytically and heterolytically weakest C-H bond. Ona proline core, 
C-H abstraction may occur preferentially at the weakest a-(C2)-H 
(bond dissociation enthalpy of about 87 kcal mol~')'* bond versus the 
a-(C5)-H (bond dissociation enthalpy of about 90 kcal mol™ 1): leading 
to racemization. Free hydroxyl radical oxidations!” and photoredox- 
mediated arylations'*'® of proline form 2-pyrrolidone and racemic 
a-arylated derivatives, respectively. 

We sought a method for a direct (C5)—H hydroxylation of proline 
that would preserve its C2 stereocentre and those in every amino 
acid residue present in peptide settings. Additionally, we sought an 
oxidant that would be highly chemoselective for the target residue 
over the other amino acid side-chain C—H bonds. For these reasons, 
we evaluated the small-molecule non-haem iron catalysts Fe(PDP) 
(catalyst 1)'®?° and Fe(CF3PDP) (catalyst 2)! (Fig. 1b). Such bulky, 
electrophilic C-H oxidation catalysts do not discriminate solely on 
the basis of C-H bond dissociation energies, but rather select between 
C-H bonds on the basis of their electronic, steric and stereoelectronic 
properties. This, along with observations of stereoretentive oxidations 
of an isoleucine derivative and dipeptide, suggested that site selectiv- 
ity for C5 proline oxidation was likely, given that C2 is both sterically 
and electronically deactivated!. Additionally, in complex-molecule 
settings, catalyst 1 was shown to oxidize hyperconjugatively acti- 
vated C-H bonds (for example, ethereal C-H bonds) at faster rates 
than other aliphatic C-H bonds”, suggesting that regioselectivity for 
a-(C5)-H proline, hyperconjugatively activated by the nitrogen lone 
pair, would effectively compete with C-H oxidation of aliphatic amino 
acid residues. 

We began our investigations into this NRPS-inspired strategy with 
the evaluation of the oxidation reactivity of N-(4-nitrophenylsulfonyl)- 
(L)-proline methyl ester (—)-3 with Fe(PDP) (1) (Fig. 2a). Subjection 
of (—)-3 to reported slow addition conditions” with 1 (25 mol%), 
AcOH, and HO; at room temperature led to full oxidation at C5 of 
proline, affording the glutamic acid derivative (—)-4 in 77% yield, 
presumably via over-oxidation of singly oxidized 5-HP as its open- 
chain tautomer. We reasoned that a milder oxidation protocol may 
allow for selective oxidation of proline to the desired 5-HP, and found 
that by lowering the reaction temperature to 0°C and decreasing the 
catalyst loading (iterative addition of 1, 15 mol%), it was possible to 
isolate 5-HP in good yield (62%) (see the Supplementary Information 
for details). A similarly encouraging result was observed with the less 
rigid proline homologue pipecolic acid, affording 6-hydroxylpipecolic 
acid 5 in 53% yield. Interestingly, Boc-proline methyl ester (where 
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Figure 1 | NRPS-inspired strategy for iron-catalysed C-H oxidative 
functionalization of amino acids and peptides. a, Oxidative tailoring 
iron-enzyme pre- and post-assembly modifications in the biosynthesis 
of vancomycin. Iron enzymes diversify tyrosine into the two UAAs 
hydroxyphenylglycine and (3-hydroxytyrosine, which are incorporated 
by the NRPS into a heptapeptide. Post-assembly oxidative tailoring by 
iron enzymes effects side-chain cross-linking to afford the vancomycin 
core. X=OH or the peptidyl carrier protein; R=H or methyl. b, The 


Boc = tert-butoxylcarbonyl) gave oxidation to Boc-pyroglutamic 
acid methyl ester under the same conditions as the major isolated 
product (see the Supplementary Information for details). Gratifyingly, 
these experiments resulted in conditions for C5 oxidation of proline 
with control of the final oxidation state. Notably, we did not observe 
oxidation or racemization of the C2 stereocentre, even under the 
forcing conditions used to generate the glutamic acid analogue (—)-4. 
We questioned whether in situ derivatization of the hemiaminal 
functional group of 5-HP could effect pre-assembly oxidative tailoring 
modifications, diversifying proline into non-proteinogenic amino 
acids. Arylated proline motifs are prevalent in medicinal agents” 
Direct arylation at the 5-position of proline could be effected by 
a sequential proline oxidation/arylation procedure: crude 5-HP 
generated by Fe(PDP) oxidation is treated with BF;OEt, to afford a 
highly reactive N-sulfonyl iminium ion intermediate that undergoes 
diastereoselective nucleophilic attack by an electron-rich arene 
(Fig. 2b). We first explored phenols as arenes in this transforma- 
tion: phenol and 2-naphthol adducts (+-)-6 and (—)-7 were isolated 
in high yields with >20:1 syn-stereochemistry, and with generally 
high regioselectivity (3.1:1.0 ortho/para for 6, >20:1 for 7). Using 
this oxidative arylation procedure, a novel crosslink (—)-8 between 
proline and tyrosine was efficiently forged, reminiscent of the side- 
chain crosslinks between amino acids effected by oxidative tailoring 
enzymes (for example, vancomycin). Additionally, the intriguing 
natural product-amino acid conjugate (+-)-9 was produced when the 
polyphenol natural product resveratrol was employed as the arene. The 
scope of electron-rich arenes is not limited to phenols, as high yields 
and selectivites were observed with heteroarenes such as anthrone, 
indole, and benzothiophene, affording adducts 10-12. The adducts 
were generally formed in syn-stereochemistry—possibly owing to 
steric factors introduced by the nosyl group—confirmed by single- 
crystal X-ray diffraction of adducts (—)-7, (+)-11, and (+)-12 (see 
the Supplementary Information for details). Interestingly, the anthrone 
adduct (—)-10 was furnished as the anti-diastereomer. Overall, this 
proline oxidation/arylation procedure efficiently furnishes stereochem- 
ically enriched (>20:1 diastereomeric ratio) 5-arylproline derivatives, 
presenting an array of structural features and functional groups. 
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small-molecule non-haem iron C-H oxidation cute Fe(PDP) 

1 and Fe(CF3;PDP) 2. PDP = [N,N’-bis(2-pyridylmethyl)]-2,2’- 
bipyrrolidine. c, Iron catalysts 1 and 2 catalysed pre-assembly oxidative 
modification of proline to afford numerous classes of UAAs. Post-assembly 
oxidative modifications by 1 and 2 of proline-containing polypeptides to 
furnish UAA-functionalized polypeptides. Ns = 4-nitrophenylsulfonyl; 

Bn = benzyl; [ox], oxidation; yellow circles indicate sites of oxidative 
modification. 


To complement the synthetic versatility of 5-HP as a precursor to 
rigid proline derivatives, we envisioned that in situ transformations of 
the open-chain aldehyde tautomer of 5-HP could be a second avenue to 
access a variety of linear UAA structures that remain difficult synthetic 
targets (see the Supplementary Information for details). We developed 
a one-pot approach starting with Fe(PDP) (1) oxidation of (—)-3 to 
5-HP followed by either reduction, olefination or reductive amination 
to furnish linear terminal hydroxyl-, olefin- or amino-containing UAAs 
(Fig. 2c). For example, (—)-3 was transformed to the 5-hydroxy-L- 
norvaline derivative (++)-13 via Fe(PDP) (1) hydroxylation followed 
by in situ reduction with NaBH. Alternatively, C-H hydroxylation 
followed by Wittig olefination of (L)- or (D)-proline furnished the 
chiral (L)-2-aminohex-5-enoic acid derivative (+-)-15 and its enanti- 
omer (43% and 40%, respectively) (see the Supplementary Information 
for details). Similarly, performing this transformation on the proline 
homologue pipecolic acid generated the (D)-2-amino-6-heptenoic acid 
derivative (—)-17. The retention of stereochemistry at C2 of proline 
(—)-3 over these sequences was established by synthetic derivatiza- 
tion and comparison of optical activity of products (—)-4, (+)-13, and 
(+)-15 to known compounds (see the Supplementary Information for 
details). 

Fe(PDP) (1)-catalysed C-H hydroxylation followed by reductive 
amination afforded a general method of installing amines to furnish 
valuable UAAs, such as the chiral ornithine derivative (+)-19. The 
diversity of functionalized secondary and primary amines that may 
be used renders this a powerful transformation; for example, using 
1-(2-aminopyridyl)-piperazine, a fluorescently labelled aminopyridine 
conjugated UAA (—)-21 may be directly generated in an optically active 
form. The backbone amine of any suitably protected amino acid may 
be used to furnish backbone-to-side-chain linkages such as in the 
tryptophan derivative (++)-22. Utilization of less sterically encumbered 
primary amines results in reductive amination followed by intramo- 
lecular cyclization to afford optically enriched 3-aminopiperidinone 
scaffolds like (+-)-23. Notably, additional reactive functionality can be 
united with the proline-derived backbone: proline oxidation/reductive 
amination with propargylamine furnished alkyne-substituted (+)-24, 
which may undergo a Cu-catalysed azide-alkyne cycloaddition to 
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Figure 2 | Four amino acids transformed to twenty-one chiral UAAs 
via small-molecule iron-catalysed C-H hydroxylations. a, Oxidations 
to glutamic acid and 5-HP. Slow addition was as follows: ACOH 

(0.5-5 equiv.) was added to a MeCN solution of (—)-3. 1 (0.25 equiv. in 
CH3CN, 0.2 M) and H2O> (5-9 equiv. in CH3CN, 0.4-0.72 M) were added 
via syringe pump (75 min) simultaneously. Iterative addition: (—)-3 in 
MeCN was cooled to 0°C. 1 (5mol%) and AcOH (0.5 equiv.) were added, 
followed by dropwise addition (3 min) of 0°C MeCN solution of HO 


afford optically enriched triazole (+-)-25. Significantly, these UAAs 
can be readily denosylated under mild conditions to furnish chiral 
amino esters with N-protecting groups common to peptide synthesis 
(for example, (+)-14, (+)-16, (—)-18, and (+)-20). 

Additionally, we evaluated the generality of this method for the 
oxidation of chiral-pool amino acids possessing oxidizable aliphatic 
side-chain residues with stronger tertiary and secondary C-H bonds 
to enable direct routes to important UAAs (Fig. 2d). For example, 
exposure of leucine-, valine-, and L-norvaline-derived substrates 
to the reaction conditions with either 1 (tertiary oxidation) or 
2 (secondary oxidation) at room temperature resulted in efficient 
aliphatic C-H oxidation, affording the tertiary hydroxyl derivatives 
(+)-27 and (+)-30 and the 6-oxo derivative (+-)-33 in good yields. 
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(1.9 equiv.). The addition of 1, ACOH and HzO) was repeated twice, every 
10 min. Crude 5-HP was passed through a silica gel plug and concentrated 
before arylation (in b) or reduction, olefination, or reductive amination 
(in c). d, Aliphatic C-H oxidation. *Starting material recycled once. 

+dr, diastereomeric ratio, after isolation; 0/p, ortho/para; RT, room temperature; 
DAST, diethylaminosulfur trifluoride; SI, Details for all methods can be 
found in the Supplementary Information. Yields represent the average of 
two experiments. Yields in parentheses are average yield per step. 


These chiral hydroxylated amino acids are widely used in medicinal 
chemistry and as synthetic intermediates”. Importantly, the ability of 
catalysts 1 and 2 to selectively oxidize aliphatic side-chain C-H bonds 
of amino acids was not diminished when this method was applied to 
dipeptides possessing these residues, as similarly efficient oxidation 
of a leucine and valine residue were observed in these settings, 
see (—)-35 and (+)-38. The tertiary hydroxyl groups in (+)-27, 
(+)-30, and (—)-35 were converted to the fluorinated amino acids 
(+)-28 and (+)-31 and the fluorinated peptide (—)-36. Collectively, 
these results demonstrate a small-molecule-catalysed NRPS pre- 
assembly modification strategy, wherein a simple proline precursor 
and three other amino acids prone to oxidation are converted to 
twenty-one chiral UAAs representing seven distinct functional group 
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Figure 3 | Direct oxidative modification of N-terminal, C-terminal and 
internal proline residues in peptides by small-molecule iron-catalysed 
C-H hydroxylation. a, Chemoselective oxidative modifications of 
N-terminal and C-terminal proline-containing peptides. b, Diversification 
of a tetrapeptide via chemoselective oxidation/functionalization 


arrays: alcohols, fluorines, aryls, carboxylates, olefins, ketones and 
amines. 

Post-assembly oxidative tailoring modifications in the more complex 
setting of a peptide were possible with catalysts 1 and 2 because of high 
functional group tolerance for amides in peptide settings, as well as high 
chemoselectivity in C5 oxidation of proline preferentially over other 
aliphatic C-H oxidations (Fig. 3a). For example, subjecting tripeptide 
(—)-39 to oxidation with 1 at 0°C led to the direct hydroxylation of 
the proline residue, with no observed off-site oxidation at the leucine 
residue. The use of catalyst 2 for proline over-oxidation in peptides 
was superior to catalyst 1, possibly owing to the increased steric bulk 
around the iron centre of 2, which minimizes off-site tertiary oxidation 
and deleterious coordination with the peptide. Underscoring the site 
selectivity and chemoselectivity that can be achieved with catalyst 2, it is 
noteworthy that a +4 change in oxidation state of a methylene carbon in 
(—)-41 to a carboxylic acid in (—)-42 could be effected in the presence 
of an oxidizable tertiary C-H bond of a nearby leucine residue. 

The Fe(PDP) C-H hydroxylation/arylation and reductive amina- 
tion sequences were further tested in a challenging tetrapeptide setting 
(—)-43 that included potentially oxidizable leucine, alanine, and 
tyrosine residues (Fig. 3b). Proline oxidation occurred with high site 
selectivity, and functionalization proceeded to efficiently furnish 
the amine (++)-44 and the naphthol adduct (—)-45. We additionally 
examined the positional flexibility of proline oxidation, and found that 
catalyst 2 controlled over-oxidation of tripeptides containing an inter- 
nal proline and furnished the corresponding glutamic acid derivatives 
(—)-49 and (—)-51 in excellent overall yields (62% and 57% yield, 
respectively; Fig. 3c). The internal proline of tripeptide (—)-46 could 
also be transformed to the amine-containing residue (—)-47 and the 
bishomoserine residue (—)-48 via catalyst 1 oxidation followed by 
either reductive amination or reduction, respectively. 

We sought to test our hypothesis that the ability to selectively install 
5-HP residues into proline-containing precursor peptides with catalyst 1 


R’ =H, R= Me, ()-46 
Boe, R = i-Bu, (-50 R= 


R’ =H, R=Me, ()-49, 62% 

Boc, R =i-Bu, (-)-51, 57% 
sequences. *Starting material recycled once. c, Direct oxidative opening 
of internal proline residues in tripeptides affords UAA- or glutamic-acid- 
containing tripeptides. Yields represent the average of two experiments. 
Yields in parentheses note the average yield per step. All slow additions 
were run with AcOH (0.5 equiv.)/H2O> (5 equiv.). 


or catalyst 2 would enable a small-molecule-catalysed post-assembly 
oxidative strategy, affording late-stage diversification of peptides to 
new structures containing natural or unnatural amino acids (Fig. 4a). 
The tripeptide (—)-39 was subjected to the full suite of proline oxida- 
tive modification reactions to install a phenol (oxidative arylation), 
carboxylic acid (controlled over-oxidation with catalyst 2), alkene 
(Wittig olefination), alcohol (reduction), and four different amine 
functionalities (reductive amination), in good overall yields (average 
40%, 63% per step) without observing epimerization of «-C-H bonds 
(chiral amino acid analysis of (—)-53 indicated no epimerization to 
D-configuration of any residues; see the Supplementary Information 
for details). Strikingly, eight novel peptide sequences (52-59) were 
rapidly constructed from one peptide in one to two steps, underscor- 
ing the potential for such reactions to enable efficient diversification of 
native residues in a preassembled peptide setting. Alternative routes to 
make all eight peptides would involve eight separate syntheses from the 
respective amino acid building blocks, including the synthesis of UAAs. 

Macrocyclic peptides are highly prevalent among NRPS natural 
products, and are valued as therapeutic candidates relative to their 
linear analogues owing to their increased stability against chemical and 
enzymatic degradation, increased receptor selectivity, and pharmacoki- 
netic properties”-*°. We sought to explore how the rapid installation 
of new functional groups in peptides from a simple proline residue 
could allow for the rapid construction and elaboration of macrocy- 
cles’. The phenol-, carboxylic acid- and olefin-derived tripeptides 
(52-54, see above) could be rapidly transformed into three macrocycles 
containing the ethereal (—)-60, amide (—)-61, and aliphatic (—)-62 
linkers, respectively, via short synthetic sequences (Fig. 4a) (see the 
Supplementary Information for details). The presence of functional 
groups on the linkage of stapled peptide-like structures like these 
has been shown to modulate the biological properties of the overall 
product”’. Collectively, the small library of molecules rapidly synthe- 
sized from tripeptide (—)-39 demonstrates the breadth of functionally 
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Figure 4 | Small-molecule iron-catalysed oxidative diversification of 
tripeptides and macrocycles. a, Fe(PDP) 1 and Fe(CF3PDP) 2 oxidative 
modifications of a single tripeptide enables synthesis of eight functionally 
diverse UAA-containing tripeptides. Slow addition was run with 

AcOH (0.5 equiv.)/H2O2 (5 equiv.). *Macrocycles 60-62 were prepared 
from tripeptides 52-54 using 5-step transformations involving alkene 
appendage to the UAA residue, coupling of a fourth alkene-containing 
amino acid to the C terminus, conversion of Nosyl to a Boc group, 


and structurally enriched molecules that can be accessed using our 
post-assembly oxidative strategy. 

Proline has been used by synthetic chemists as a turning-element 
that helps to bring the ends of a linear peptide together to promote 
macrocyclizations*”’. We explored whether the NRPS-inspired C-H 
oxidation/functionalization strategy would enable internal proline 
residues, which serve as turn elements within a linear peptide sequence, 
to be transformed into a range of natural and unnatural acyclic amino 
acids. Encouraged by the high positional flexibility of proline oxidation 
(see above), we assembled a proline-containing linear pentapeptide 
(—)-64, using our pre-assembly modified UAA (+)-15, which was 
rapidly produced by C-H oxidation/olefination of proline (—)-3, and 
subjected it to ring-closing metathesis, which proceeded in good yield 
(61%) to furnish an 18-membered macrocycle. Reduction of the internal 
olefin with diimide provided the macrocyclic pentapeptide (—)-65. 
Application of the post-assembly C-H oxidation/functionalization with 
1 to this macrocycle resulted in the late-stage conversion of the proline 
conformational element to a dibenzylornithine derivative (—)-66. 
This example underscores the potential for proline residues as diver- 
sifiable structural elements that may be functionally and structurally 
transformed at late stages in complex peptide settings. 
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ring-closing metathesis and hydrogenation. Individual routes vary in 
order. See the Supplementary Information for full details. b, Late-stage 
diversification of a proline-containing peptide macrocycle via post- 
assembly oxidation/reductive amination. +(—)-63 is Boc-Ala-Pro-Ala- 
(D)-Allylglycine-OMe. Yields generally represent the average of 

two experiments. Yields in parentheses indicate the average yield 

per step. 


The NRPS-inspired oxidation strategy described herein represents 
a powerful method for the direct diversification of amino acids and 
peptides via C-H oxidation. We anticipate that this strategy will 
benefit small-peptide therapeutics by enabling the rapid explora- 
tion of key physical properties (such as charge, polarity, and steric 
and stereochemical effects) and inspire the continued invention of 
non-directed, site-selective C-H oxidation reactions that unmask 
the potential for the pluripotent reactivity of C-H bonds in complex 
molecules. 
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A progressively wetter climate in southern East 
Africa over the past 1.3 million years 


T. C. Johnson! ?, J. P. Werne?, E. T. Brown!, A. Abbott*, M. Berke, B. A. Steinman!, J. Halbur!, S. Contreras®, S. Grosshuesch!, 
A. Deino’, R. P. Lyons*+, C. A. Scholz’, S. Schouten?!” & J. S. Sinninghe Damsté”)!° 


African climate is generally considered to have evolved towards 
progressively drier conditions over the past few million years, 
with increased variability as glacial-interglacial change intensified 
worldwide!~*. Palaeoclimate records derived mainly from northern 
Africa exhibit a 100,000-year (eccentricity) cycle overprinted on a 
pronounced 20,000-year (precession) beat, driven by orbital forcing 
of summer insolation, global ice volume and long-lived atmospheric 
greenhouse gases“. Here we present a 1.3-million-year-long 
climate history from the Lake Malawi basin (10°-14° S in eastern 
Africa), which displays strong 100,000-year (eccentricity) cycles of 
temperature and rainfall following the Mid-Pleistocene Transition 
around 900,000 years ago. Interglacial periods were relatively 
warm and moist, while ice ages were cool and dry. The Malawi 
record shows limited evidence for precessional variability, which 
we attribute to the opposing effects of austral summer insolation and 
the temporal/spatial pattern of sea surface temperature in the Indian 
Ocean. The temperature history of the Malawi basin, at least for the 
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past 500,000 years, strongly resembles past changes in atmospheric 
carbon dioxide and terrigenous dust flux in the tropical Pacific 
Ocean, but not in global ice volume. Climate in this sector of eastern 
Africa (unlike northern Africa) evolved from a predominantly arid 
environment with high-frequency variability to generally wetter 
conditions with more prolonged wet and dry intervals. 

Rainfall is the key metric for eastern African climate; annual tem- 
perature variations are limited, while moisture availability is far less 
predictable and profoundly affects distributions of vegetation and habi- 
tability across the landscape. Proxy records of northern and eastern 
African palaeoclimate reveal a trend towards drier conditions over the 
past few million years, overprinted by Milankovitch-scale cycles tied 
to Earth’s orbit about the Sun). However, it is unclear whether this 
trend holds for all of Africa. Our objectives were to determine how tem- 
perature and rainfall in the Malawi basin responded to orbital forcing 
of summer insolation, and whether they tracked global records of 
climate change, such as ice volume® and atmospheric carbon dioxide’, 


Figure 1 | The location and 
bathymetry of Lake Malawi, 
African rainfall response 

to the Indian Ocean dipole, 
and the drill site age model. 
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Figure 2 | Vegetation (hydroclimate) and temperature history of 

the Lake Malawi basin. a, Profile of calcium abundance (n = 22,010), 
determined by X-ray fluorescence analysis at about 2-cm intervals, 
followed by a 20-point running mean. b, Profile of 5'°C3) (n= 207); each 
sample was run at least in duplicate and co-injected with squalane as an 
internal standard to monitor reproducibility of measurements. c, Profile 
of corrected temperature (n = 406); roughly 10% of the samples were 
run in duplicate, and displayed average differences in TEXg¢ of 0.0076, 
corresponding to a 20 temperature value of 0.8 °C. d, Profile of LRO4 
with marine isotope stage (MIS) numbers for glacial periods identified® 


as well as the trends of increased aridity and variability observed farther 
north in Africa. 

Although Lake Malawi (Fig. 1a) is an open basin today, its surface has 
dropped well below the elevation of its outlet on numerous occasions, 
creating closed-basin conditions with attendant shoreline migration 
and elevated water salinity*®. We analysed sediment samples recovered 
by the Lake Malawi Drilling Project to produce records of temperature 
(TEXge) and aridity (calcium (Ca) content and leaf wax 6'°C3), where 
C3) indicates an n-alkane containing 31 carbon atoms) extending back 
to approximately 1.3 million years before present (Methods). Our age 
model for the sediment sequence is based on 15 radiocarbon dates, 
3 dated tephra layers, and 6 palaeomagnetic reversals, supplemented by 
alignment of the TEXg. temperature record with the LR04 (the stacked 
record® of marine benthic foraminifera §'°O) to assign ages between 75 
and 590 thousand years (kyr) ago (Fig. 1c) (Methods). 

A cyclic structure is apparent in the TEXg¢ temperature record after 
600 kyr ago (Fig. 2, Extended Data Fig. 1). This is overprinted by consid- 
erable variability throughout the entire sediment sequence; we attribute 
this variability to the ill-defined ecology of freshwater Thaumarchaeota, 
which produce the TEX¢g¢ signal (such as the preferred water depth 
for Thaumarchaeota, dominant species in the archaeal community, 
seasonal variability) !°"', and to inter-seasonal and inter-annual lake 
circulation dynamics. The temperature varies with an amplitude of 
about 4°C, between approximately 19°C and 23°C, displaying pro- 
gressively larger-amplitude glacial-interglacial variations from marine 
isotope stage (MIS) 13 (about 500 kyr ago) to MIS 5 (about 125kyr ago), 
after a remarkably cool MIS 14 (about 540 kyr ago) (Fig. 2c). 


© 2016 Macmillan Publishers Limited, p 


(n=950). The heavy grey line in a represents a 100-kyr low-pass filter 
through the calcium data, which highlights the progressively wetter mean 
climate of the Malawi basin over the past 1.3 million years, as do the §!°C3, 
data. The green bar in the upper left corner of b indicates +1c analytical 
uncertainty in 6'°C3). lo and 2 uncertainties in temperature due to 
analytical and lapse rate corrections (Methods) are indicated by the light 
blue solid and dashed lines, respectively, in c. Blue-shaded bars highlight 
some wet intervals, illustrating their correlation with warm temperatures 
during interglacial periods. Percentage woody vegetation is estimated from 
the §3C3, correlation of ref. 16. 


The degree of glacial cooling in the Malawi basin over the past 
600 kyr does not match the amplitude of change in global ice volume 
as represented by the LR04 record (Fig. 2d). For example, Malawi basin 
glacial cooling was only about 2°C during MIS 12, when continental 
ice sheets were particularly extensive, but was 4°C during MIS 14, when 
global ice volume was relatively limited (Fig. 2c, d). However, the Malawi 
temperature record more closely matches the atmospheric carbon 
dioxide record for the past 600 kyr in the EPICA ice core’ (Fig. 3a), 
and even more closely aligns (R? = 0.363 and 0.658, P< 0.05, on 
raw and 100-kyr low-pass-filtered data, respectively) with a 500-kyr 
record of terrigenous dust flux to the central equatorial Pacific’? 
(Fig. 3b). These relationships reflect carbon dioxide’s key role in 
regulating tropical eastern African temperatures on a glacial—interglacial 
timescale, and suggest that atmospheric dust may also have contributed 
to temperature regulation on these timescales, despite its modest radiative 
impact, estimated at an order of magnitude lower than that of green- 
house gases, 

Calcium concentrations undergo a dramatic change from high- 
amplitude variability between calcareous and non-calcareous sediment 
(reflecting relatively arid and moist conditions, respectively) before 
900 kyr ago to more prolonged periods of non-calcareous sediment 
and mainly lower-amplitude variations in the calcium values thereafter 
(Fig. 2a). Arid intervals were also longer after 900 kyr ago, including the 
most extreme ‘mega-drought of the past million years®'* which, if our 
tuned age model is correct, had its onset during MIS 6. Intriguingly, dry 
conditions when the lake surface was tens to hundreds of metres below 
outlet elevation (lowstand conditions) persisted intermittently well into 
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Figure 3 | Correlation of the Malawi temperature record with 
atmospheric carbon dioxide and dust. Temperature record for the 
Malawi basin (blue data points), plotted with atmospheric carbon dioxide 
(green data points)’ (in a) and the dust flux to the central equatorial 
Pacific Ocean as represented by the depositional flux of 7°*Th (red data 
points)!” (in b). 


MIS 5, when a final recovery to hydrologically overfilled conditions 
developed (Fig. 2a). 

§°Cs; displays variability within a range of about 4%o throughout the 
record, superimposed on a trend towards more negative mean values 
from around —24.5%o in the earlier part of the record to around —29%o 
in the last 100,000 years. We attribute this trend to a gradual shift to 
wetter conditions, and not to a change in catchment (other than its 
expansion when lake level dropped). From the perspective of basin 
evolution, we observe no evidence for major changes in catchment 
geometry (such as drainage capture), or large alterations of topographic 
gradients or the deformational regime over the interval of the drill 
core. Earlier in the Malawi rift valley’s history, major shifts in catch- 
ment geometry and basin physiography would have occurred, owing to 
continental extension. On the basis of the extensive seismic reflection 
data set available from the basin’*, such changes were unlikely in the 
most recent 15%-20% or so of the basin’s history represented in our 
drill core. 

5'3C of vegetation can vary considerably, from about —24%o to 
—35%o for C3 and from —11%0o to —14%o for C4 plants’®. Nevertheless, 
an empirical relationship has been established between 513C of soil 
organic matter and fractional cover of woody vegetation (1? = 0.77) 
in eastern Africa!’. This has been extended to §'°C of C3, n-alkanes 
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preserved in lake sediment!®. We use this relationship to estimate the 


vegetation in the Malawi catchment at the time of biomarker deposi- 
tion. Our measured values of 6'°C3, reflect substantial variations in the 
grassland cover, with less than 10% woody cover before 900 kyr ago, 
wooded grasslands consisting of 10%-40% woody plant cover and well 
developed ground cover of grasses and herbs from 900 kyr ago to the 
present, and occasional establishment of true woodlands, consisting of 
>40% woody plant cover in open or closed stands of trees, shrubs or 
thickets in the past 100 kyr (Fig. 2b). 

After 900 kyr ago the calcium record displays intervals of 
carbonate-free sediment accumulation that lasted several tens of thou- 
sands of years, signifying open lake conditions during humid periods. 
These alternated with dry intervals, marked by carbonate accumulation, 
of comparable duration. The lake highstands correlate with periods of 
more negative 6'°C;, and relatively warm temperatures (Fig. 2). Thus 
the Malawi basin experienced warm, wet interglacials and cooler (by 
about 2-4 °C), dry glacial periods, with a roughly 100-kyr periodicity, if 
our assumed age model is correct, over the past 900 kyr (Extended Data 
Fig. 2). Itis noteworthy that during the megadrought initiated in MIS 6, 
when Lake Malawi was reduced to a saline lake <100m deep? 448, 
the amplitude of the §!°Cs; shift was relatively small, about 5%o 
(Fig. 2b). However, the relationship between 6'C;3, and landscape 
vegetation is nonlinear, so the relatively small 5%o shift in &8C3) 
represents a fourfold decrease in woody vegetation from ~40% to 
~10% of the land cover. 

Holocene climate shifts linked to precessional forcing are reported 
throughout northern Africa and much of tropical eastern Africa 
(including the so-called “African Humid Period”!’), but the anti-phased 
response in the Southern Hemisphere (that is, an early Holocene arid 
period) appears to have been subdued and intermittent (for example, 
an early Holocene drop of about 120 m in the level of Lake Malawi” 
interspersed with highstands”’; brief, early Holocene highstands of 
Lake Makgadikgadi at about 20° S (ref. 22) and Lake Chilwa at about 
15° S (ref. 23). The apparent lack of an approximately 20-kyr cycle in the 
hydroclimate of the Lake Malawi basin over the past 900 kyr (Extended 
Data Fig. 2) is consistent with this more recent history. 

We attribute the limited precessional signal in the Malawi record 
to the opposing effects of orbital influence on summer insolation and 
the Indian Ocean Dipole (IOD) pattern of sea surface temperature 
(SST) variability. At present, rainfall is enhanced in tropical eastern 
Africa but diminished in southern Africa during positive IOD phases 
(warm western equatorial Indian Ocean relative to the eastern Indian 
Ocean), and vice versa™*, The IOD shows strong precessional variability, 
with the positive phase aligned with Northern Hemisphere summer 
insolation (Fig. 4) (Methods). Such a relationship enhances preces- 
sional variability in hydroclimate in the eastern African tropics north 
of the Equator because both factors contribute to increased rainfall. 
By contrast, the IOD is out of phase with summer insolation in the 
Southern Hemisphere, so the two factors have opposite effects on 
rainfall and thereby weaken precessional influence. 

Lake Malawi is close to the present-day boundary of the dipolar 
African rainfall pattern that falls between tropical eastern Africa and 
southern Africa (Fig. 1a), with contrasting responses to the IOD. 
Tierney et al.?> report a multidecadal relationship between eastern 
African rainfall and a positive IOD that differs from the aforemen- 
tioned inter-annual pattern, with less rainfall over Lake Malawi and 
other lakes in the rift valley interior and wetter conditions closer to the 
eastern African coast and the Horn of Africa. In the Malawi record, 
precessional variation in precipitation may have been further obscured 
by changes due to migrations of the precipitation pattern boundary; 
these would have shifted the Malawi basin between the two IOD rain- 
fall regimes. 

Several interacting mechanisms may have contributed to the long- 
term trend towards a wetter climate in the Lake Malawi basin over 
the past million years. Aridification in the Horn of Africa has been 
attributed to Indian Ocean cooling due to the northward displacement 
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Figure 4 | Northern Hemisphere summer insolation and the Indian 
Ocean SST gradient. The west-minus-east SST gradient in the equatorial 
Indian Ocean (IOD) based on alkenone data from two core sites shown in 
Fig. 1a’? (solid blue line), and Northern Hemisphere (at 65° N) summer 
insolation (dashed red line) over the past 140,000 years. 


of New Guinea and narrowing of the Indonesian throughway over the 
past 3-4 million years”®. An overall cooling of the Indian Ocean should 
result in a drier hydroclimate throughout eastern Africa, even in the 
Malawi basin. However, an anti-phased relationship in hydroclimate 
response to the IOD in the interior rift valley versus coastal eastern 
Africa and the Horn of Africa” suggests not only that the Indian Ocean 
became cooler, but that the IOD became progressively less positive 
over time. Model results suggest that this would be accompanied by a 
weakening of a localized Walker circulation over the Indian Ocean, less 
ascending air over the western Indian Ocean and coastal Africa, and 
more precipitation in the rift valley, including Lake Malawi”. 

Lake Malawi sediment recorded a transition from a highly variable 
and predominantly arid climate before 900 kyr ago to a progressively 
more humid environment after the Mid-Pleistocene Transition, which 
was dominated by 100-kyr cycles consisting of warm, wet interglacial 
periods alternating with cooler, drier glacial periods. This shift towards 
more humid conditions contrasts with the well documented progres- 
sion towards more aridity in northern Africa over the same period, as 
recorded in the carbon isotopic composition of soil carbonates and in 
dust fluxes to sediments in both the Atlantic Ocean and the Gulf of 
Aden (Extended Data Fig. 3)”. Yet another pattern is shown in a leaf- 
wax isotope record of South African vegetation (recovered from the 
tropical Atlantic Ocean) that displays a shift in dominant periodicity 
from about 40-kyr to about 100-kyr cycles through the Mid-Pleistocene 
Transition’, as observed in the Malawi record, but with no long-term 
trend towards either wetter or drier conditions. This growing body of 
evidence attests to a large regional variability in climate history over 
the African continent. 

Regional differences in hydroclimate undoubtedly influenced the 
migration of our human ancestors”*. As northern Africa became more 
arid over the past million years, the Malawi basin evolved towards a 
wetter, more hospitable environment, at least during interglacial 
times. The Malawi record raises key questions about African climate, 
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regarding how much of the rift valley shifted to wetter conditions over 
the past million years, whether MIS 14 was an unusually cold ice age 
throughout the region, and what role precessional forcing had on 
hydroclimate to the north of Malawi. Future drilling campaigns on the 
East African Great Lakes will offer unique opportunities to address 
these questions and to understand the changing landscape where our 
ancestors evolved, migrated and advanced their cultures. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 

The Lake Malawi Drilling Project recovered a 380-m sediment sequence in 2005 
from a water depth of 590m. Cores MAL05-1B and MALO5-1C of the Drilling 
Project, and a nearby piston core M98-13P (Extended Data Table 1) were analysed 
for past temperature and rainfall. Seismic reflection profiles used to select the site 
portray an undisturbed sedimentary section that was not impacted by erosion, 
turbidity currents or mass wasting events!®. Sediment samples were analysed to 
produce records of temperature (TEXgg) and aridity (calcium content and leaf 
wax §C),. 

Palaeo-temperature derived from TEXgg. TEXg¢ is a proxy for temperature in the 
upper water column, based on the distribution of glycerol dialkyl glycerol tetrae- 
ther (GDGT) membrane lipids of Thaumarchaeota living in the water column*)*?, 
Thaumarchaeota are ammonia oxidizers that live throughout the aerobic and sub- 
oxic water column, but have been found in many lake systems to have a maximum 
abundance just below the thermocline or chlorophyll maximum****, Indeed, in 
Lake Malawi the maximum abundance of the most labile GDGT produced by 
Thaumarchaea has been identified at 50 m water depth*. Nevertheless, the distri- 
bution of their lipids is strongly related to surface water temperature in many lake 
systems*)>, The GDGTs used to determine TEXg¢ are well preserved in sedi- 
ments and have been identified intact in sediments as far back as the Cretaceous 
Period to estimate past ocean temperatures*”. 

Lipids were extracted from 577 freeze-dried, homogenized sediment samples 
using accelerated solvent extraction (Dionex ASE) using hexane/dichloromethane 
(DCM) 9:1 (v/v) at 100°C and 7.6 x 10° Pa to obtain a total lipid extract. The total 
lipid extract was then separated into neutral, free fatty acid, and phospholipid 
fatty acid fractions using an aminopropylsilyl bond elute column, cleaned before 
use with 10 ml successive rinses of methanol followed by 1:1 DCM:2-propanol. 
Eight millilitres each of 1:1 DCM:2-propanol, 4% glacial acetic acid in distilled 
ethyl ether, and methanol were used with the cleaned columns to elute the neutral, 
free fatty acid, and phospholipid fatty acid fractions, respectively. Short column 
chromatography with activated alumina as the stationary phase was used to further 
separate the neutral fraction into apolar and polar fractions using 9:1 hexane: DCM 
followed by 1:1 DCM: methanol as eluents for the two fractions, respectively. The 
polar fraction containing the GDGT lipids required for TEXg. analysis was filtered 
(0.45,1m filter), dried under Np, and then redissolved in 99:1 hexane:isopropanol 
for analysis. The apolar fraction containing n-alkanes was further separated into 
saturated and unsaturated hydrocarbons using Ag*-impregnated silica gel column 
chromatography as described in ref. 38. 

GDGTs were analysed by high-performance liquid chromatography/atmos- 
pheric pressure chemical ionization mass spectrometry (HPLC/APCI-MS), using 
an Agilent 1100 series liquid chromatograph with an Alltech Prevail Cyano column 
(150mm x 2.1 mm; 31m diameter)*’. Annual lake surface temperatures (ALST) 
in degrees Celsius were calculated using the TEXg¢ ratio of GDGTs“ and the lake 
sediment calibration of ref. 41: 


ALST =49.032(TEXg¢) — 10.989 (1? =0.88; n= 16) 


A global marine calibration for TEXg¢ yields a mean error of 2.5°C (ref. 42), and 
the global lake calibration yields a mean error of 3.6°C (ref. 41). Although this is 
quite large relative to the rather small interannual variation in tropical tempera- 
ture, the error of the global calibration is undoubtedly amplified by the differing 
composition of communities as well as differences in seasonality and depth habitat 
of Thaumarchaeota in different lakes. Within a single lake community, we suspect 
that the correlation between TEXge and ALST is much tighter. 

The sampling interval for TEXg¢ analyses averaged about 10cm in the depth 
interval of 0-8 metres below lake floor (m.b.Lf.), 50cm in the depth interval 
8-18 m.b.1.f,, and 1m in the depth interval 18-379 m.b.Lf. Roughly 10% of the 
samples were run in duplicate, and displayed average differences in TEXge of 
0.0076, corresponding to a 2c temperature value of 0.8°C. 

Acceptance criteria for TEXg, data. Palaeotemperature data derived from TEXg¢ 
analysis may be compromised if the concentrations of the isomers used to calculate 
TEXg¢ are too low, or if a substantial portion of the isoprenoid GDGTs used to 
calculate TEXg¢ are derived from sources other than Thaumarchaeota, such as 
soil archaea in the catchment or methanogenic archaea living in the lower water 
column“. We accepted TEXg¢ values for our palaeotemperature reconstructions 
as long as the following criteria were met: 

(1) The signal must be above the limit of quantitation®’, which in our case cor- 
responded to integral peak areas >104 for the 1,300 and 1,292 (‘cren’) isomers. 
(2) The ratio of GDGT-0/crenarchaeol <2.0 must exclude the impact of meth- 
anogen archaea. 

(3) The crenarchaeol isomer must be <10% of the sum of the crenarchaeol and 
the crenarchaeol isomer and GDGT-2 must be <45% of the total GDGTs to 
exclude influences of archaea other than Thaumarchaeota. 

(4) The BIT (branched and isoprenoid tetraether) index must be >0.5 to exclude 


LETTER 


the impact of soil-derived GDGTs (see refs 11 and 43 for further explanation and 
rationale for these criteria). 

Using these criteria, we rejected 107 of the 584 analyses for TEXgs. The rejected 
TEXg¢ data yielded temperatures ranging between about 4°C and 38°C. Nearly 
all of them were associated with a BIT index >0.5, and most were warmer than 
adjacent temperatures that met our acceptance criteria (Extended Data Fig. 4). The 
temperatures derived from the accepted TEXg¢ measurements fall within a range 
of ~18-28 °C (Extended Data Fig. 4). 

Almost all of the TEXg¢ values that were rejected correlate with depths where the 
sediments are calcareous and a lake-level index based on sediment composition! 
indicates relatively arid conditions with the lake at a lowstand. Under such condi- 
tions, the shoreline would have encroached upon the drill site, increasing the likeli- 
hood of terrestrially derived GDGT input from the catchment, a diminished aquatic 
production of GDGTs by Thaumarchaeota, proliferation of other archaeal species, 
and potential remobilization of previously deposited shallower water sediments. 
Temperature correction due to changing lake-surface elevation. The remaining 
accepted temperatures were then corrected for the lapse rate effect of the lake’s 
major lowstands. Lake Malawi has experienced drops of several hundred metres in 
lake level during prolonged arid periods in the past (referred to as “megadroughts” 
in ref. 9), and during such times of depressed lake level, water temperature would 
have risen owing to the temperature lapse rate alone, independent of any regional 
change in temperature. The lake-level history is not known precisely, but is best 
represented by principal component 1 (Lyons’ PC1, or LPC1)", derived from prin- 
cipal component analysis of four sediment parameters: natural gamma radiation, 
C/N ratio and 6'9C of bulk organic matter, and reflected colour of the core digital 
image. We corrected the TEXg¢ temperatures by using the linear relationship 
between LPC1 and lake-level depression (in metres) that Lyons et al.'* derived 
from direct comparison of LPC1 with the depths of correlative lowstand deltas 
identified on seismic reflection profiles: 


ALL = —154.18(LPC1) — 247.05 (R? = 0.9628) 


where ALL is the drop in lake level, in metres, from the present-day lake level. 

Assuming a moist tropical lapse rate of 6°C km !, we multiplied negative values 

of ALL by 0.006 to obtain the temperature correction to be subtracted from the 
original TEXg¢ temperature to arrive at a temperature for constant lake elevation. 
In the few intervals where ALL had a positive value (implying lake level higher than 
the present lake level), no temperature correction was applied because the lake can- 
not rise much above outlet elevation, and the lapse rate effect would be negligible. 
We applied the prediction uncertainty range of the Lyons et al.'* PC1-to-lake-level 
relationship to estimate uncertainty in the lapse-rate-based temperature correc- 
tion. The positive 20 prediction uncertainty values were adjusted to not exceed 
a lake-surface elevation change ALL of greater than zero (that is, during times of 
high lake level and overflow). The overall temperature uncertainty was calculated 
by adding in quadrature the analytical (20 of 0.8°C, described above) and lapse- 
rate correction uncertainties and resulted in average and maximum 2¢ values of 
about 1.0°C and 1.3 °C, respectively. The correction for lapse rate effect reduces 
the average temperature of the record by about 2°C, but the overall amplitude of 
temperature shift and the occurrence of distinct intervals of relatively warm and 
relatively cold temperatures remain (Extended Data Fig. 1). 
Palaeo-aridity derived from 51°C of leaf-wax n-alkanes and calcium abundance. 
Stable carbon isotopic compositions of Cz9—C33 n-alkanes derived from fossil leaf 
waxes primarily reflect the relative abundances of C3 (mostly trees, shrubs and 
herbs) and C, (mostly grass) vegetation’®**. An Agilent 6890N gas chromatograph 
(60-m HP-1 column, 0.32 mm inner diameter, 0.25 1m film thickness) interfaced 
to a Thermo Finnigan Delta?’ XP mass spectrometer via a combustion interface 
was used to determine the §!°C of n-alkanes. All §!3C values are reported as 
per mil deviations from the Vienna Pee Dee Belemnite (VPDB) standard using 
conventional delta notation. The gas chromatograph temperature program begins at 
50°C and increases at a rate of 50°C min“! to 180°C and next at a rate of 3°C min“! 
to 320°C. The final temperature of 320°C is held for 6 min. The n-alkanes sepa- 
rated by the gas chromatography column are oxidized at 940°C and converted to 
carbon dioxide. A standard mixture of n-alkanes of known 65°C values was analysed 
multiple times daily ((Mix-A’ of Cj¢—C39 n-alkanes provided by A. Schimmelmann, 
Indiana University); from these replicate measurements, the typical precision of 
the °C measurements is +0.5%o (1). Each sample was run at least in duplicate 
and co-injected with squalane as an internal standard to monitor reproducibility of 
measurements. Replicates were analysed and plotted individually for each n-alkane 
sample, with a mean error of +0.41%o for C3; duplicates. 

While C; herbs are found in both woodlands and grasslands, §'3C of C3; n-alkanes 
in tropical eastern Africa generally reflect the restructuring of the landscape between 
woodlands dominated by C3 vegetation, indicating relatively humid conditions, and 
by C, grasslands, representing relatively arid conditions in the lake basin!°. 
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The trends of §!3C in the Cy9, C3), and C33 n-alkanes are broadly similar in 
timing and amplitude (Extended Data Fig. 5). Any one of these profiles could have 
been used to reflect the history of vegetation and hydroclimate on the landscape 
surrounding Lake Malawi, and would have been consistent with our interpretation 
of the environmental history of the region. We chose to reflect the terrigenous leaf- 
wax data as §°C3), which is usually the most abundant of the three n-alkanes and 
was chosen by ref. 16 to relate to C3 and C, plant-type abundance in eastern Africa, 
which we utilize in Fig. 2. We find the down-core trend in 813C3; to closely track the 
trend in 6'%C,,ax, the weighted mean average of the three n-alkanes that has been 
reported in previous studies in eastern Africa (see ref. 38) (Extended Data Fig. 5). 

Calcium abundance in bulk sediment, determined by scanning X-ray fluores- 

cence, was used as another indicator of past hydroclimate. The calcium signal is 
strongly bi-modal, exhibiting high values in calcareous sediments, which accu- 
mulate only during lake lowstands”™* and low values during wetter conditions 
when the lake overflows at its outlet, leaving the water column under-saturated 
with respect to calcite*. 
Refining the age model for the MAL05-1 drilling site. Our age model (Fig. 1c) is 
based initially on 15 radiocarbon dates in the upper 16m of core, the presence of the 
youngest Toba ash (75 + 1 kyr ago) at 28.1 m.b.1.f. (ref. 46), Ar-Ar dates on tephra at 
167.8m.b.Lf. (590 + 20 kyr ago) and 241.6 m.b.L-f. (915 + 8 kyr ago), and six palaeo- 
magnetic reversals beginning with the Brunhes—Matuyama at 222 m.b.Lf. (ref. 14) 
(Extended Data Fig. 6a, Extended Data Table 2). These dates indicate an average 
sedimentation rate of 0.28mkyr! for the drill site, although they provide no age 
control in the interval between 75 kyr ago (Toba ash horizon) and 590 kyr ago (the 
younger of the two Ar—Ar dates). The corrected temperature versus depth-in-core 
record (Extended Data Fig. 1b) displays a statistically significant 33.6-m cycle, 
which corresponds to an eccentricity period of about 121 kyr, assuming a sedimen- 
tation rate of about 0.28 m kyr’ | (see statistics in Extended Data Fig. 6b). We note 
that the raw temperature data before correction for lapse rate also displays a sig- 
nificant cycle of 38.5 m (bandwidth is 0.0001 and 80% error estimate on the power 
spectrum is 0.626), which indicates that the lapse rate correction is not artificially 
introducing cyclicity into the temperature record. Consequently, we align the TEX¢5 
corrected temperature record with the LR04 marine stacked benthic foraminif- 
eral 6'8O record to assign ages in the interval 75-590 kyr ago, recognizing that 
the periods of relatively warm temperature must have coincided with interglacials 
and relatively cool temperatures with glacial periods (Extended Data Fig. 6c). 
The resultant age assignments and sedimentation rates based on this alignment 
do not deviate dramatically from the initial age model (Extended Data Fig. 6a). 

Employing our age model, we performed Blackman-Tukey spectral analysis on 

the lake-level history derived from LPC1 (Extended Data Fig. 2). A strong spectral 
peak corresponding to an approximately 100-kyr (eccentricity) cycle appears for 
the period from 900 kyr ago to the present, but not before 900 kyr ago. We conclude 
that: (1) the tuned age model is credible, and (2) a statistically significant shift to 
100 kyr cycles in climate variability occurred at about 900 kyr ago (see statistics in 
Extended Data Fig. 2). 
Calculating the IOD for the past 130 kyr. We examined variation in the IOD ona 
timescale of 10'-10° years by subtracting a 130-kyr alkenone record of SST in the 
eastern Indian Ocean (core GeoB 10038-4: 5° 56.25’ S, 103° 14.76’ E; ref. 29) from 
an alkenone SST record in the western Indian Ocean (core MD85668: 0° 01/ N, 
46° 02’ E; ref. 30) (Extended Data Fig. 7). 
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Extended Data Figure 1 | Correcting temperature data for lapse rate effect. a, b, Uncorrected TEXg¢ temperature (a) and temperature corrected for 
lapse rate effect (b) plotted against burial depth at drilling site MAL05-1. The light solid and dashed lines represent the 1o and 20 ranges of uncertainty in 
both graphs. c, Lake-level history, from LPC1", which is the basis for the lapse-rate correction to temperature (Methods). 
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Extended Data Figure 3 | A contrast in hydroclimate history on the three million years or more, as indicated by soil carbonate 6'°C values in 
African continent. The leaf wax §!°C3, record indicates that the Malawi northern Tanzania, Kenya and Ethiopia (summarized in ref. 47) (b) and in 
basin became progressively wetter since the Mid-Pleistocene Transition marine sediment records of terrigenous dust input from northern Africa’, 
around 900 kyr ago (a), while much of the continent to the north of as shown in ODP Sites 721 and 722 from the Gulf of Aden (c). 


Lake Malawi maintained a trend towards drier conditions over the past 
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Extended Data Figure 4 | Accepted (blue) and rejected (orange) temperatures and BIT data. Acceptance criteria are explained in the Methods. 
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Extended Data Figure 5 | §!3C of the Cy», C3, C33 n-alkanes. 6°C of the Cy9, C3; and C33 n-alkanes (a, b, c), and the weighted mean average (WMA) of 
these values (d). 
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Extended Data Figure 6 | Aligning the temperature record to LR04 
to refine the age model. a, Age versus depth for drill site MAL05-1, 
depicting ages based on '4C, tephra, magnetic reversals and alignment 
(tuning) of corrected temperature with LR04. The dashed pink line 

is a linear fit through the dates derived from radiocarbon, tephra and 
magnetic reversals only, described by the equation: Age (in kyr before 
present) = —12.44 + 3.602z (r? = 0.9984). b, Blackman-Tukey analysis 
(spectral power versus per metre) of the corrected temperature data in 
the upper 200 m of drill site MAL05-1, showing a 33.6-m cycle, which 


corresponds to about 121 kyr. Bandwidth is 0.0001 and 80% error estimate 
on the power spectrum is 0.626. c, Temperature plotted against age based 
solely on radiocarbon, tephra and magnetic reversal dates aligned to the 
LR04 age scale (green dashed lines), in order to assign ages in MAL05-1 
between 75 kyr ago (Toba ash horizon) and 590 kyr ago (Ar—Ar). The 

red lines depict tephra and magnetic reversal ages, which constrain the 
temperature alignment. B/M, Brunhes—Matuyama; UJ, Upper Jaramillo; 
LJ, Lower Jaramillo; UCM, Upper Cobb Mountain; LCM, Lower Cobb 
Mountain. Data are from ref. 14. 
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Extended Data Figure 7 | The Indian Ocean west-minus-east gradient in SST since 130 kyr ago. Alkenone records of SST in the western Indian 
Ocean (core MD85668: 0° 01’ N, 46° 02’ E) (a)*° and the eastern Indian Ocean (core GeoB 10038-4: 5° 56.25’ S, 103° 14.76’ E) (b)*?. The west-minus-east 
temperature gradient (IOD) derived from these two records is displayed in c. 
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Extended Data Table 1 | Locations of the cores analysed in this study 


Site Latitude Longitude Water Depth (m) Core Length (m) 
M98-13P 11°16’ 0” S 34°26’ 6” E 604 8.3 
MAL05-1B 11°17738”"S 34°26714”" E 590 379.29 
MAL05-1C 11°17°38"S 34°26'14” E 590 88.89 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


Extended Data Table 2 | Sediment dates that underlie the age model of this study 
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Digits and fin rays share common developmental 


histories 


Tetsuya Nakamura!*, Andrew R. Gehrke!*, Justin Lemberg!, Julie Szymaszek! & Neil H. Shubin! 


Understanding the evolutionary transformation of fish fins into 
tetrapod limbs is a fundamental problem in biology’. The search 
for antecedents of tetrapod digits in fish has remained controversial 
because the distal skeletons of limbs and fins differ structurally, 
developmentally, and histologically”. Moreover, comparisons of 
fins with limbs have been limited by a relative paucity of data on 
the cellular and molecular processes underlying the development 
of the fin skeleton. Here, we provide a functional analysis, using 
CRISPR/Cas9 and fate mapping, of 5’ hox genes and enhancers in 
zebrafish that are indispensable for the development of the wrists 
and digits of tetrapods”. We show that cells marked by the activity 
of an autopodial hoxa13 enhancer exclusively form elements of 
the fin fold, including the osteoblasts of the dermal rays. In hox13 
knockout fish, we find that a marked reduction and loss of fin rays is 
associated with an increased number of endochondral distal radials. 
These discoveries reveal a cellular and genetic connection between 
the fin rays of fish and the digits of tetrapods and suggest that digits 
originated via the transition of distal cellular fates. 

The origin of tetrapod limbs involved profound changes to the distal 
skeleton of fins. Fin skeletons are composed mostly of fin rays®, whereas 
digits are the major anatomical and functional components of the distal 
limb skeleton. One of the central shifts during the origin of limbs in 
the Devonian period involved the reduction of fin rays coincident with 
an expansion of the distal endochondral bones of the appendage’. 
Because the distal skeletons of fins and limbs are composed of different 
types of bone tissue (dermal and endochondral, respectively) it remains 
unclear how the terminal ends of fish and tetrapod appendages are 
related and, consequently, how digits arose developmentally. Although 
the understanding of ectodermal signalling centres in fin buds and fin 
folds has advanced in recent years® |}, that of the cells that form the 
skeletal patterns has remained elusive. 

Hox genes, namely those of the HoxA and HoxD clusters, have figured 
prominently in discussions of limb development and origins*!*~'4. The 
‘early and ‘late’ phases of HoxD and HoxA transcription are involved 
in specifying the proximal (arm and forearm) and distal (autopod) 
segments, respectively’. Both fate map assays and knockout pheno- 
types in mouse limbs reveal an essential role for Hox13 paralogues in 
the formation of the autopod*?. Mice engineered to lack Hoxa13 and 
Hoxd13 in limbs lack the wrists and digits exclusively*. Moreover, the 
lineage of cells expressing Hoxa13 resides exclusively in the autopod of 
adult mice’. Together, these lines of evidence reveal the extent to which 
5! Hox genes are involved in, and serve as markers for, the developmental 
pattern of the wrist and digits. Unfortunately, as no such studies have 
yet been performed in fish, the means to find antecedents of autopodial 
development in fins has been lacking. 

Analyses of 5’ Hox expression in phylogenetically diverse wild- 
type fish!®'° as well as experimental misexpression in teleosts reveal 
that 5’ Hox activity may be involved in patterning”, and defining the 
extent of, the distal chondrogenic region of fish fins”. Despite these 
advances, however, little is known about the contribution of different 


hox paralogues—individually and in combination—to the adult fin 
phenotype and the origin of cells that give rise to the distal fin skeleton. 
While previous studies have shown that osteoblasts of the fin rays in the 
caudal fin of zebrafish are derived from either neural crest or paraxial 
mesoderm, the source of osteoblasts in pectoral fin rays is currently 
unknown”?~*4, Consequently, it remains unclear where the cellular and 
genetic markers of the autopod of the tetrapod limb reside in fish fins. 

In order to bridge these gaps in knowledge, we followed the fates 
of cells marked by early and late phase hox enhancers to adult stages 
in pectoral fins. In addition, we engineered zebrafish that completely 
lacked each individual hox13 gene, and bred stable lines with multiple 
gene knockout combinations of hox paralogues. The power of these 
experiments is twofold: 1) to our knowledge, they represent the first 
functional analyses of hox activity in fins, and 2) they enable a direct 
developmental comparison to experiments performed in tetrapod 
limbs. 

We performed in situ hybridization of hoxal3a, hoxa13b, and 
hoxd13a genes from 48-120 h post fertilization (hpf) in zebrafish to 
determine whether active hox expression has a role in the development 
of the pectoral fin fold. Hoxa13 genes in zebrafish are expressed in the 
distal fin mesenchyme at 48 hpf and weakly in the proximal portion of 
the pectoral fin fold from 72-96 hpf, indicating that hoxa13 genes are 
not actively expressed in the developing fold'® (Fig. 1a, b). Hoxd13a 
is expressed in the posterior half of the fin, but it becomes weak after 
96 hpf (Fig. 1c). Hox expression is entirely absent in fins 10 days post 
fertilization (dpf) (Extended Data Fig. 1). As hox13 genes do not appear 
to have a main role in zebrafish fin fold development past 72-96 hpf, 
we sought to determine what structures hox-positive cells populate in 
the developing and adult folds. 

To follow the fates of cells that experience early phase activity in 
the zebrafish fin, we modified our previously reported transgenesis 
vector”! to express Cre-recombinase driven by the zebrafish early-phase 
enhancer CNS65*°. This enhancer activates expression throughout 
the endochondral disk of pectoral fins from 31 to ~38 hpf (Fig. 2a 
and Extended Data Fig. 1). Stable lines expressing CNS65x3-Cre were 
crossed to the lineage-tracing zebrafish line Tg(ubi:Switch) fish, in 
which cells that express Cre are permanently labelled with mCherry”. 
At 6 dpf mesenchymal cells in which expression was driven by CNS65 at 
38 hpf make up the entire endochondral disk of the pectoral fin (Fig. 2b). 
We also found mCherry-positive cells in the fin fold at 6 dpf and exten- 
sively at 20 dpf (Fig. 2b). These cells contained filamentous protrusions 
extending distally as well as nuclei positioned at the posterior side, 
both of which suggest that the cells were migrating distally out of the 
endochondral disk (Fig. 2b). 

To determine the fate of late phase cellular activity, we employed 
the same fate-mapping strategy but used a late phase hoxa enhancer 
(e16) from the spotted gar (Lepisosteus oculatus) genome”). We chose 
a hoxa enhancer because lineage-tracing data in mouse has shown that 
late phase Hoxa 13 cells in the limb make up the osteoblasts of the wrist 
and digits exclusively, making it a bona fide marker of the autopod’. 
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Figure 1 | Expression patterns of hox13 genes at 48-120 hpf. a, hoxa13a. 
b, hoxa13b. c, hoxd13a. Hoxa13a is expressed in distal mesenchyme at 

48 hpf, but expression continues in the proximal fin fold from 72 to 96 hpf 
(a). Hoxa13b is expressed in distal mesenchyme and expression can be 
observed at the distal part of the endochondral disk until 96 hpf (b). 
Hoxd13a is expressed in the posterior half of the mesenchyme at 48 hpf 
and expression continues in the posterior endochondral disk through 

96 hpf. After 96 hpf, expression becomes weak (c). Scale bars are 100 1m. 
n= 20 embryos for each in situ hybridization at 48 hpf. n = 10 embryos 
after 72 hpf. 


In addition, gar e16 (which has no sequence conservation in zebrafish) 
drives expression throughout the autopod in transgenic mice in a 
pattern that mimics the endogenous murine enhancer and Hoxal3 
expression”’””, In transgenic zebrafish, gar e16 is active in the distal 
portion of the endochondral disk of the pectoral fin at 48 hpf, and 
ceases activity after approximately 55 hpf (Fig. 2a and Extended Data 
Fig. 1). When these transgenic zebrafish were crossed to Tg(ubi:Switch), 
at 6 dpf we detected the majority of mCherry-positive cells in the 
developing fin fold with a small number of cells lining the distal edge 
of the endochondral disk (Fig. 2c). At 20 dpf, the fin fold contained 
nearly all of the mCherry-positive cells, which had formed tube-like 
cells that appeared to be developing actinotrichia (Fig. 2c). In adult fish 
(90 dpf), late phase cells were restricted to the adult structures of the 
fin fold, where they composed osteoblasts that make up the fin rays, 
among other tissues (Fig. 2d). As the e16 enhancer is active only in 
the distal endochondral disk at 48 hpf, and the labelled cells end up in 
the fin rays of the adult, late phase hox-positive cells are likely to migrate 
from the endochondral portion of the fin into the fin fold, a hypothesis 
supported by extensive filopodia in mCherry-positive cells projecting 
in the direction of the distal edge of the fin (Fig. 2c). 

To explore the function of hox13 genes, we inactivated individual 
hox13 genes from the zebrafish genome by CRISPR/Cas9 and also 
made combinatorial deletions through genetic crosses of stable 
lines (Extended Data Fig. 2 and Extended Data Table 2, 3 and 4). 
Homozygous null embryos for individual hox13 genes exhibited 
embryonic pectoral fins that were comparable in size with the wild 
type at 72 hpf (Extended Data Fig. 3). The shape and size of the fin 
fold and endochondral disk were also assayed by in situ hybridization 
for and1 and shha, which serve as markers for the developing fin fold 
and endochondral disk, respectively *8 (Extended Data Fig. 3). In adult 
fins (~120 dpf), we observed no detectable difference in the length 
of fin rays of hoxd13a~'~ mutants when compared to wild-type fish 
(Fig. 3d and Extended Data Fig. 4). However, both hoxal3a~/~ and 
hoxa13b~'~ single mutant fish retained fin rays that were shorter than 
the wild type, suggesting a role for hoxa13 genes in fin ray development 
(Fig. 3g, j and Extended Data Fig. 4). To determine the degree to which 
endochondral bones were affected, we used CT scanning technology 
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Figure 2 | Fate mapping of cells marked by the activity of hox enhancers. 
a, In situ hybridization of Cre in Dr-CNS65x3-Cre and Lo-e16x4—Cre 
exhibits expression dynamics of early and late phase enhancers used 

for fate mapping. Cre regulated by early phase hox enhancer CNS65 

is expressed throughout the fin from 31 to 38 hpf, whereas late phase 
expression (driven by e16) begins weakly in the distal fin at 38 hpf and 
ceases at ~55 hpf. Inset shows zoom in of the pectoral fin, black arrows 
point to the distal border of the endochondral disk. b, Lineage tracing of 
Dr-CNS65x3-Cre at 6 dpf and 20 dpf. Red: mCherry IF; blue: DAPI. Cells 
that experienced early phase expression (red) contribute to fin fold and 
endochondral disk. c, Lineage tracing of Lo-e16x4-Cre at 6 dpf and 20 dpf. 
Cells that underwent late phase expression are present mostly in the fin 
fold, though some cells are at the distal edge of the disk. Red cells at 6 dpf 
protrude filopodia in the distal direction, indicating that these cells are 
actively moving out into the fin fold. d, Lineage tracing of late phase hox 
cells in adult zebrafish fin (~120 dpf). mCherry cells are present only in 
the derivatives of the fin fold, and not in the endochondral disk. Inset: 
magnification of distal edge of fin rays. Green: Zns-5 osteoblast marker; 
red: Hox-positive; yellow: overlap of Zns5 and Cre. White dotted lines 
outline the fin (b, c) or endochondral bones in (d). n= 5 for stable lines. 
All scale bars are 100 j1m except for the total fin in d, which is 500 jum. 


for wild-type and mutant adult fish. Each single mutant, hoxal3a~'~, 
a13b~!~ or d13a~'~, had four proximal radials and 6-8 distal radials 
with similar morphology to those of wild-type zebrafish (Fig. 3c, f, i, 1 
and Extended Data Fig. 4). We crossed heterozygous mutants to 
obtain fish that lacked all hoxa13 genes (hoxal3a~‘~, a13b~'~). The 
fin folds of hoxal3a~'~, al13b~/— embryos were ~30% shorter than 
the wild type at 72 and 96 hpf, whereas the number of cells in the 
endochondral disk was ~10% greater (Extended Data Fig. 5). Adult 
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Figure 3 | Adult fin phenotypes of hox13 deletion series. a—c, 

wild type. d-f, hoxd13a~!~. g-i, hoxa13b~'~. j-L hoxal3a~'~. 

m-o, hoxal3a~/~, a13b~'~. p-r hoxa13a°” a13b and d13a°” (mosaic 
triple knockout; Methods and Extended Data Tables 3, 4). Each mutant 
hox sequence is found in Extended Data Tables 3, 4. a, d, g, j, m, p, Alzarin 
Red and Alcian Blue staining of pectoral fin. b, e, h, k, n, q, CT scanning 
of pectoral fins. Black: radials (endochondral bones); grey: fin rays 
(dermal bones). Note that hoxa13 single (g, h, j, k), double (m, n), and 
mosaic triple (p, q) mutant fins show shorter fin rays than wild type 

(a, b). Fins were scaled according to the bone staining pictures. ¢, f, i, 1, 0, 
r, Enlarged images of CT scanning without fin rays to reveal endochondral 
patterns. Dark grey; proximal radials, red; distal radials. Upper left side is 
the anterior and bottom right is the posterior side in each picture. Double 
and triple knockout mutants have 10-13 distal radials (o and r; Extended 
Data Fig. 4, Supplementary Information). Third and fourth proximal 
radials started to fuse into one bone in hoxal3a~'~, a13b~'~ (0). Note that 
posterior distal radials are stacked along proximodistal axis (0). Posterior 
proximal radials are broken down into small parts in mosaic triple 
knockout (r). Scale bars are 2mm. The size of specimens are not scaled in 
c, f,i, 1, o and r to display the detail of distal radials. n = 3 fish for single 
and double mutants and n =5 fish for mosaic triple mutant. 


hoxal3a~'~,a13b~‘~ fish exhibited greatly reduced fin rays (Fig. 3m, 
Extended Data Fig. 4 and Supplementary Information). In contrast to 
dermal reduction, the endochondral distal radials of double mutants 
were significantly increased to 10-13 in number, often stacked along the 
proximodistal axis (Fig. 30, Extended Data Fig. 4 and Supplementary 
Information, P=0.0014, t-test comparing the means). A similar pattern 
was seen in triple knockout fish (mosaic for hoxa13b and hoxd13a) 
(Fig. 3p-r and Extended Data Fig. 4) along with altered proximal radials, 
implying that late phase hox genes are involved in patterning the proxi- 
mal endochondral radials of fins, unlike their role in tetrapods (Fig. 3). 

Despite being composed of different kinds of skeletal tissue, fin rays 
and digits share a common population of distal mesenchymal cells 
that experience late phase Hox expression driven by shared regulatory 
architectures and enhancer activities”!. In addition, loss of 5’ Hox 
activity results in the deletion or reduction of both of these structures. 
Whereas phylogenetic evidence suggests that rays and digits are not 
homologous in terms of morphology, the cells and regulatory processes 
in both the fin fold and the autopod share a deep homology that may 


be common to both bony fish and jawed vertebrates’®. 
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Figure 4 | Shared developmental histories in fin rays and digits. In mice 
(top row), late phase Hox expression (red) marks the distal cells of the 

limb bud that result in bones of the autopod (wrists and digits). Double 
knockout of Hoxa13 and Hoxd13 results in the loss of the autopod. In 
zebrafish wild-type fins (middle row), cells marked by late phase hox 
expression (red) end up in the fin fold and within osteoblasts of the 

dermal rays. Hoxa13 double knockout fish (hoxal3a~'~, a13b'~) and 

the triple knockout (mosaic for hoxa13b and hoxd13a) have extremely 
reduced fin rays with increased distal endochondral radials. Note that 
distal radials are stacked along the proximodistal axis in the posterior 

of the fins. The results lead to the the hypothesis (bottom row) that the 
knockout phenotype results from a deficit in migration of mesenchymal 
cells with more cells left in the distal fin bud (increased number of cells in 
the endochondral disk of mutants fins, Extended Data Fig. 4) and fewer 
migrating to the fold, thereby resulting in a larger number of endochondral 
bones and reduced dermal ones. Red cells: cells that experienced late phase 
hox expression. Mouse limbs consist of only endochondral bones, but 

fish fins contain endochondral (black) and dermal (transparent; fin rays) 
bones. 


Two major trends underlie the fin-to-limb transition—the 
elaboration of endochondral bones and the progressive loss of the 
extensive dermal fin skeleton?””°. In the combinatorial knockouts of 
hox13 genes, which in tetrapods result in a loss of the autopod, distal 
endochondral radials were increased in number while fin rays were 
greatly reduced. As a common population of cells in the distal append- 
age is involved in the formation of rays and digits, the endochondral 
expansion in tetrapod origins may have occurred through the transition 
of distal cellular fates and differential allocation of cells from the fin fold 
to the fin bud’ (Fig. 4). The two major trends of skeletal evolution in 
the fin-to-limb transition may be linked at cellular and genetic levels. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 

All zebrafish work was performed according to standard protocols approved by 
The University of Chicago (ACUP #72074). No statistical methods were used to 
predetermine sample size. The experiments were not randomized and the investi- 
gators were not blinded to allocation during experiments and outcome assessment. 
Whole-mount in situ hybridization. In situ hybridization for the hox13, Cre, and1 
and shha genes were performed according to standard protocols” after fixation 
in 4% paraformaldehyde overnight at 4 °C. Probes for hox13 and shha were as 
previously described!*, Primers to clone Cre and and1 into vectors can be found 
in Extended Data Tables 1 and 2. Specimens were visualized on a Leica M205FA 
microscope. 

Lineage tracing vector construction. In order to create a destination vector for 
lineage tracing, we first designed a random sequence of 298 bp that contained 
a Smal site to be used in downstream cloning. This sequence was ordered as a 
gBlocks fragment (IDT) and ligated into the pCR8/GW/TOPO TA cloning 
vector (Invitrogen). We then performed a Gateway LR reaction according to the 
manufacturers specifications between this entry vector and pXIG-cFos-GFP, 
which abolished an Ncol site present in the gateway cassette and introduced a Smal 
site. We then removed the GFP gene with Ncol and BglII of the destination vector 
and ligated in Cre with (primers in Extended Data Table 1), using the ‘pPCR8GW- 
Cre-pA-FRT-kan-FRT’ (kind gift of M. L. Suster, Sars International Center for 
Marine Molecular Biology, University of Bergen, Bergen, Norway) as a template 
for Cre PCR and Platinum Taq DNA polymerase High Fidelity (Invitrogen). In 
order to add a late phase enhancer to this vector, we first ordered four identical 
oligos (IDT gBlocks) of the core e16 sequence from gar, each flanked by different 
restriction sites. Each oligo was then ligated into pCR8/GW/TOPO, and sequen- 
tially cloned via restriction sites into a single pCR8/GW/TOPO vector. This entry 
vector was used a template to PCR the final Lo-e16x4 sequence and ligate it into 
the Cre destination vector using Xhol and Smal, creating Lo-e16x4—Cre. The early 
phase enhancer Dr-CNS65x3 was cloned into the destination vector using the same 
strategy. Final vectors were confirmed by sequencing. A full list of sequences and 
primers used can be found in Extended Data Table 1. 

Establishment of lineage tracing lines. *AB zebrafish embryos were collected 
from natural spawning and injected according to the Tol2 system as described 
previously”". Transposase RNA was synthesized from the pCS2-zT2TP vector using 
the mMessage mMachine SP6 kit (Ambion)”!. All injected embryos were raised 
to sexual maturity according to standard protocols. Adult FO fish were outcrossed 
to wild-type *AB, and the total F1 clutch was lysed and DNA isolated at 24 hpf for 
genotyping (see Extended Data Table 1 for primers) to confirm germline trans- 
mission of Cre plasmids in the FO founders. Multiple founders were identified and 
tested for the strongest and most consistent expression via antibody staining and 
in situ hybridization. One founder fish was identified as best, and all subsequent 
experiments were performed using offspring of this individual fish. 

Lineage tracing crossing and detection. Founder Lo-e16x4-Cre and 
Dr-CNS65x3-Cre fish were crossed to the Tg(ubi:Switch) line (kind gift from 
L. I. Zon). Briefly, this line contains a construct in which a constitutively active 
promoter (ubiquitin) drives expression of a loxP flanked GFP protein in all cells 
of the fish assayed. When Cre is introduced, the GFP gene is removed and the 
ubiquitin promoter is exposed to mCherry, thus permanently labelling the cell. 
We crossed our founder Cre fish to Tg(ubi:Switch) and fixed progeny at different 
time points to track cell fate. In order to detect the mCherry signal, embryos or 
adults were fixed overnight in 4% paraformaldehyde and subsequently processed 
for whole-mount antibody staining according to standard protocols*’ using the 
following antibodies and dilutions: 1st rabbit anti-mCherry/DsRed (Clontech 
#632496) at 1:250, 1st mouse anti-Zns-5 (Zebrafish International Resource Center, 
USA) at 1:200, 2nd goat anti-rabbit Alexa Fluor 546 (Invitrogen #A11071) at 1:400, 
2nd goat anti-mouse Alexa 647 (Invitrogen #A21235) at 1:400. Stained zebrafish 
were mounted under a glass slide and visualized using an LSM 710 confocal micro- 
scope (Organismal Biology and Anatomy, the University of Chicago). Antibody 
stains on adult zebrafish (90 dpf) fins were imaged on a Leica SP5 I] tandem 
scanner AOBS Laser Scanning Confocal (the University of Chicago Integrated 
Light Microscopy Core Facility). 

CRISPR/Cas9 design and synthesis. Two mutations were simultaneously intro- 
duced into the first exon of each hox13 gene by CRISPR/Cas9 system as previously 
described in Xenopus tropicalis*'. Briefly, two gRNAs that match the sequence of 
exon 1 of each hox13 gene were designed by ZiFiT (http://zifit.partners.org/ZiFiT/). 
To synthesize gRNAs, forward and reverse oligonucleotides that are unique for 
individual target sequences were synthesized by Integrated DNA Technologies, 
Inc. (IDT). Each oligonucleotide sequence can be found in Extended Data Table 2. 
Subsequently, each forward and reverse oligonucleotide were hybridized, and 
double stranded products were individually amplified by PCR with primers that 
include a T7 RNA promoter sequence, followed by purification by NucleoSpin 
Gel and PCR Clean-up Kit (Macherey-Nagel). Each gRNA was synthesized 
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from the purified PCR products by in vitro transcription with the MEGAscript 
T7 Transcription kit (Ambion). Cas9 mRNA was synthesized by mMESSAGE 
mMACHINE SP6 Transcription Kit according to the manufacturer's instructions 
(Ambion). 

CRISPR/Cas9 injection and mutants selection. Two gRNAs targeting exon 1 of 
each hox gene were injected with Cas9 mRNA into zebrafish eggs at the one-cell 
stage. We injected ~2 nl of the injection solution (511 solution containing 1,000 ng 
of each gRNA and 500 ng Cas9 diluted in nuclease-free water) into the single cell of 
the embryo. Injected embryos were raised to adulthood, and at three months were 
genotyped by extracting DNA from tail clips. Briefly, zebrafish were anaesthetized 
by Tricaine (0.004%) and tips of the tail fin (2-3 mm?) were removed and placed 
in an Eppendorf tube. The tissue was lysed in standard lysis buffer (10 mM Tris 
pH 8.2, 10mM EDTA, 200 mM NaCl, 0.5% SDS, 200 1g/ml proteinase K) and 
DNA recovered by ethanol precipitation. Approximately 800-1,100 bp of exon 
1 from each gene was amplified by PCR using the primers described in Extended 
Data Table 2. To determine whether mutations were present, PCR products were 
subjected to T7E1 (T7 endonuclease1) assay as previously reported**. After 
identification of mutant fish by T7E1 assay, detailed analysis of mutation patterns 
were performed by sequencing at the Genomics Core at the University of Chicago. 
Establishment of hox13 single and double mutant fish. Identified mutant fish 
were outcrossed to wild type to select frameshift mutations from mosaic mutational 
patterns and establish single heterozygous lines. Obtained embryos were raised to 
adults (~3 months), then analysed by T7E1 assay and sequenced. Among a variety 
of mutational patterns, fish that have frameshift mutations were used for assays as 
single heterozygous fish. We obtained several independent heterozygous mutant 
lines for each hox13 gene to compare the phenotype among different frameshift 
mutations. To obtain hoxal3a*’!~, hoxa13b*!~ double heterozygous mutant fish, 
each single heterozygous mutant line was crossed with the other mutant line. 
Offspring were analysed by T7E1 assay and sequenced after three months, and 
double heterozygous mutant fish were selected. To generate double homozygous 
hoxa13 mutant embryos and adult fish (hoxal3a~'~, hoxa13b~'~), double hete- 
rozygous fish (hoxal3at!~, hoxa13b+/~) were crossed with each other. The ratio 
of each genotype from crossing heterozygous fish is summarized in Extended Data 
Table 4. 

Genotype of single (hoxa13a~/~ or hoxa13b~/~) or double (hoxal3a~/~, 
hoxa13b~/—) mutant by PCR. After mutant lines were established, single (hoxal3a 
or hoxa13b) or double (hoxa13a, hoxa13b) mutant embryos and adult fish were 
genotyped by PCR for each analysis. Primer sequences for PCR are listed in 
Extended Data Table 2. To identify an 8 bp deletion in exon 1 of hoxa13a, the PCR 
product was treated by Aval at 37°C for 2h, because the 8 bp deletion produces 
a new Aval site in the PCR product (‘zebra hoxa13a_8 bp del primers, wild type; 
231 bp, mutant; 111 bp and 119bp). Final product size was confirmed by 3% 
agarose gel electrophoresis. To identify a 29 bp deletion in exon 1 of hoxa13a, the 
PCR product was confirmed by gel electrophoresis (“zebra hoxal3a_29 bp del’ 
primers, wild type; 110 bp, mutant; 81 bp). To identify a 14bp insertion in exon 1 
of hoxa13b, the PCR product was treated by Bccl at 37°C for 2 h, because the 14 bp 
insertion produces a new Bccl site in the PCR product (‘zebra hoxa13b_14bp 
ins’ primers, wild type; 98 bp, mutant; 53 bp + 57 bp). The final product size was 
confirmed by 3% agarose gel electrophoresis. The details of the mutant sequence 
are summarized in Extended Data Table 3a-c. 

Combination of stable and transient deletion of all hox13 genes by CRISPR / 
Cas9. Two gRNAs targeting exon 1 of hoxa13b and two gRNAs targeting exon 1 
of hoxd13a were injected with Cas9 mRNA into zebrafish one-cell eggs that were 
obtained from crossing hoxal3at!~ and hoxal3at!~, hoxa13bt!~, hoxd13at!~ 
(gRNAs were same as that were used to establish single hox13 knockout fishes 
and found in Extended Data Table 2). Injected eggs were raised to adult fish and 
genotyped by extracting DNA from tail fins. PCR products of each hox13 gene 
were cloned into PCRIITOPO (Invitrogen) and deep sequencing was performed 
(Genomic Core, the University of Chicago). At four months old, skeletal staining 
and CT scanning were performed to analyse the effect of triple gene deletions. 
The knockout ratios of each hox13 allele were calculated from the results of deep 
sequencing. 

Measurement of the fin fold length. Embryos were obtained by crossing 
hoxal3a*'~, hoxa13b*'~ to each other and raised to 72 hpf or 96 hpf. After fix- 
ation by 4% PFA for 15h, caudal halves were used for PCR genotyping. Pectoral 
fins of wild type and hoxal 3a~!~, hoxa13b~'~ were detached from the embryonic 
body and placed horizontally on glass slides. The fins were photographed with a 
Leica M205FA microscope, and the fin fold length along the proximodistal axis at 
the centre of the fin was measured using Image]. The resulting data were analysed 
by t-test comparing the means. 

Counting the cell number in endochondral disk. Embryos were obtained by 
crossing hoxal 3at!~, hoxa13b*'~ to each other and raised to 96 hpf. After fixation 
by 4% PFA for 15h, caudal halves were used for PCR genotyping. Wild type 
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and hoxal3a~'~, hoxa13b~/~ embryos were stained by DAPI (1:4,000 in PBS- 
0.1% Triton) for 3h and washed for 3 h by PBS—0.1% Triton. Pectoral fins were 
detached from the embryonic body, placed on glass slides and covered by a 
coverslip. The DAPI signal was detected by Zeiss LSM 710 (Organismal Biology 
and Anatomy, the University of Chicago). Individual nuclei were manually marked 
using Adobe Illustrator and the number of nuclei was counted. The data were 
analysed by t-test comparing the means. 

Adult fish skeletal staining. Skeletal staining was performed as previously 
described**. Briefly, fish were fixed in 10% neutral-buffered formalin overnight. 
After washing with milli-Q water, solutions were substituted by 70% EtOH ina 
stepwise fashion and then by 30% acetic acid/70% EtOH. Cartilage was stained 
with 0.02% alcian blue in 30% acetic acid/70% EtOH overnight. After washing- 
with milli-Q water, the solution was changed to a 30% saturated sodium borate 
solution and incubated overnight. Subsequently, specimens were immersed in 
1% trypsin/30% saturated sodium borate and incubated at room temperature 
overnight. Following a milli-Q water wash, specimens were transferred into a 1% 
KOH solution containing 0.005% Alzarin Red S. The next day, specimens were 
washed with milli-Q water and subjected to glycerol substitution. Three replicates 
for each genotype were investigated. 

PMA staining and CT scanning. After skeletal staining, girdles and pectoral fins 
were manually separated from the body. Girdles and fins were stained with 0.5% 


PMA (phosphomolybdic acid) in milli-Q water for 16h and followed by washes 
with milli-Q water. Specimens were placed into 1.5 ml Eppendorf tubes with water 
and kept overnight to settle in the tubes. The next day, tubes containing specimens 
were set and scanned with the UChicago PaleoCT scanner (GE Phoenix v/tome/x 
240kv/180kv scanner) (http://luo-lab.uchicago.edu/paleoCT.html), at 50 kVp, 
160A, no filtration, 5 x -averaging, exposure timing of 500 ms per image, anda 
resolution of 8}1m per slice (512m? per voxel). Scanned images were analysed 
and segmented using Amira 3D Software 6.0 (FEI). Three replicates for single 
and double homozygotes and five for mosaic triple knockout were investigated. 
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Extended Data Figure 1 | Cre in situ hybridization of lineage tracing 
fish. a, Cre is expressed only from 31 hpf to 38 hpf in Dr-CNS65x3-Cre, 
whereas it is expressed from 38 hpf to 55 hpf in Lo-e16x4-Cre. These 
temporal expression patterns of Cre indicate that our transgenic lineage 
tracing labelled the cells which experienced only early or late phase hox. 
Scale bars are 100m. b, Cre expression pattern from 48-120 hpf in 
independent Lo-e16x4-Cre lines (different founders from a). The fin is 
outlined by a dashed white line. The expression patterns from different 
founders were investigated and all expression ceases before 72 hpf. 

Our in situ results indicate that Lo-e16x4—Cre marks only the cells that 


experienced late phase hox expression from 38-55 hpf. n =5 embryos 

for all stages. Scale bars are 100 jm. c, The expression pattern of and1 

and hox13 genes in wild type (10 dpf) and also Cre in Lo-e16x4-Cre line 
(10 dpf and 3 months, n= 10). Whereas and1 expression can be observed 
in fin fold (positive control, black arrow), hox13 genes are not expressed 

at 10 dpf in the wild type. Cre is not expressed at 10 dpf and at 3 months 

in the fin, indicating that Lo-e16x4-Cre activity is limited to only early 
embryonic development (38-55 hpf). Three month fins were dissected 
from the body of Lo-e16x4-Cre lines and subjected to in situ hybridization 
(n= 3). Scale bars are 500 1m at 10 dpf and 3 months. 
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Extended Data Figure 2 | T7E1 assay of FO CRISPR/Cas9 adult fish. 
PCR products of hoxal3a, hoxa13b or hoxd13a were subjected to a T7E1 
assay (Methods) and confirmed by gel electrophoresis. a, The result of the 
hoxa13a, hoxa13b or hoxd13a T7E1 assay for ten adult fish. ‘M? is a 100 bp 
DNA ladder marker (NEB). In the hoxa13a gel picture, 810 bp (black 
arrowhead) is the wild-type band as observed in cont. lane (wild type 
without gRNA injection). All ten fish showed smaller and bottom 

shifted products (red arrowheads) compared to negative control fish, 
indicating that all fish have mutations in the target region of hoxa13a. 

In the hoxa13b gel picture, 1,089 bp is the wild-type band. All ten fish into 
which hoxa13b gRNAs were injected showed smaller and bottom shifted 
products compared to negative control fish, indicating that all fish have 


cont. 1 2 3 4 5 6 7 8 9 10 


b 
T7E1 positive /Genotyped Percent of mutant fish 

hoxa13a 20/20 100.0% 

hoxa13b 20/20 100.0% 

hoxd13a 18/20 90.0% 
c 

T7E1 positive /total embryos Frameshift mutation 
hoxa13a 3/10 1 (10.0%) 


T7E1 positive /totalembryos Frameshift mutation 


hoxa13b 2/10 2 (20.0%) 
T7E1 positive /total embryos Frameshift mutation 
hoxd13a 6/10 3 (33.3%) 


mutations in the target region of hoxa13b. In the hoxd13a gel picture, 
823 bp is the wild-type band. Eight of ten fish showed smaller and bottom 
shifted products, indicating that 80% of fish have mutations in the target 
region of hoxd13a. b, The efficiency of CRISPR/Cas9 deletion for hox13 
in zebrafish. Almost all adult fish into which gRNAs and Cas9 mRNA 
were injected have mutations at the target positions. c, The efficiency of 
germline transmission of CRISPR/Cas9 mutant fish. Identified mutant 
fish were outcrossed to wild-type fish to obtain embryos and confirmed 
germline transmission. Obtained embryos were lysed individually 

at 48 hpf, genotyped by T7E1 assay and sequenced. Because of CRISPR/ 
Cas9 mosaicism, some different mutation patterns, which result in a 
non-frameshift or frameshift mutation, were observed. 
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Extended Data Figure 3 | Embryonic phenotypes of hox13 deletion 
mutants. a, e, i, m, q, Whole body pictures at 72 hpf. a, Wild type, 

e, hoxal3a~'~ (4 bp del./4 bp del.), i, hoxa13b~/~ (4 bp del./14 bp ins.), 
m, hoxd13a~'~ (5 bp ins./17 bp del.), and q, hoxal3a~/~, hoxa13b~'~ 
double homozygous embryo (8 bp del./29 bp del., 14 bp ins./14 bp ins.). 


The details of mutant sequences are summarized in Extended Data Table 3. 


Wild-type and single homozygous fish for hoxa13a or hoxa13b were 
treated by PTU to inhibit pigmentation. The body size and length of 
mutant embryos are relatively normal at 72 hpf. n=5 embryos for all 
genotypes. b, f, j, n, r, Bright field images of pectoral fins. Pectoral fins 
were detached from the body and photographed (Methods). Hoxal3a ~~, 
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shha 


a13b ~'~ double homozygous embryo shows 30% shorter pectoral fin fold 
compared to wild type (r, see also Extended Data Fig. 5). n =5 embryos 
for all genotypes. c, g, k, 0, s, and1 in situ hybridization at 72 hpf. Hox13 
mutants show normal expression patterns, which indicates that fin fold 
development is similar to wild type in these mutants. n = 3 embryos for all 
genotypes. d, h, |, p, t, shha in situ hybridization at 48 hpf. Hox13 mutants 
show a normal expression pattern that is related to relatively normal 
anteroposterior asymmetry of adult fin (Fig. 3, Extended Data Fig. 4 and 
Supplementary Information). n =3 embryos for all genotypes. Scale bars 
are 1 mm (a), 200 1m (b, c) and 100,1m (d). 
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Extended Data Figure 4 | See next page for caption. 
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Extended Data Figure 4 | Phenotype of adult hox13 mutant fish. 

a, c, e, g, i, k, m, Whole body morphology of hox13 deletion mutants 
were photographed at 4 months old; hoxal3a~'~ (8 bp del./29 bp del.), 
hoxa13b~'~ (4 bp del./14 bp ins.), hoxd13a~/~ (5 bp ins./10 bp ins.), 
hoxal3a~'~, hoxa13b~'~ double homozygous fish (8 bp del./29 bp del., 
14bp ins./14 bp ins.) and triple knockout (k, m, mosaic for hoxal3a 
hoxa13b and hoxd13a) fish (Methods). n = 3 fish for wild type, single and 
double mutants and n =5 fish for triple mosaic mutants (same specimens 
were used as in Fig. 3). The details of mutant sequences are summarized 
in Extended Data Table 3. Each homozygous mutant fish shows normal 
morphology at 4 months old except for slightly short pectoral fin rays of 
hoxal3a~'~ or a13b~'~ single mutants. Hoxal3a~'~, hoxa13b~'~ double 
homozygous fish shows a severe reduction of fin rays in pectoral, pelvic, 
dorsal and anal fins compared with wild type. The triple knockout (mosaic 
for hoxal3a, hoxa13b and hoxd13a) fish also showed a reduction in fin 
rays. Scale bar is 5 mm. Owing to the size of the adult fish, three different 
pictures for anterior, centre and posterior of the body were merged to 
make whole-body pictures. b, d, f, h, j, 1, n, Bone staining pictures of 
mutant fish. The endochondral bones of pectoral fins are shown. Whereas 
single homozygous fish show relatively normal proximal radials (b, d, f, h 
and Fig. 3), double homozygous mutants show fused third and fourth 


LETTER 


proximal radials (j). One triple knockout (mosaic for hoxal3a, hoxa13b 
and hoxd13a, 0, 25, 50%) fish had fused third and fourth proximal radials 
(i), but another triple knockout (0, 0, 0%) had more broken down proximal 
radials (n). n =3 fish for wild type, single and double mutants and n=5 
fish for triple mosaic mutants (same specimens were used as in Fig. 3). 

The scale bar is 500 1m. 0, p, Examples of counting distal radials in wild- 
type and hoxal3a~'~, hoxa13b~'~ double homozygous fish. First distal 
radials are not shown in CT segmentation because of a fusion with first 

fin ray. q, The number variation of distal radials in mutant fish. Multiple 
fins were investigated in wild type (25 fish/50 fins), hoxal3a~/~ (4 bp 
del./4 bp del., 3 fish/6 fins), hoxa13b~'~ (4 bp del./14 bp ins., 3 fish/6 fins), 
hoxd13a~'~ (5 bp ins./17 bp del., 3 fish/6 fins), hoxal3a~!~, hoxa13b~'~ 
double homozygous (8 bp del./29 bp del., 14 bp ins./14 bp ins., 3 fish/6 fins) 
and triple knockout (mosaic for hoxa13a, hoxa13b and hoxd13a) fish 

(five fish/10 fins). The number of distal radials increased to 10 and 

13 in double and triple mutants, respectively. The difference in distal radial 
number between wild-type and double homozygous or wild-type and triple 
knockout fish (mosaic for hoxa13a, hoxa13b and hoxd13a) is statistically 
significant (P= 0.0014 or P=0.00001, respectively, t-test comparing the 
means, two-tailed distribution). 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


a 
wild-type hoxat3a7-, a13b I 
= 4 fin fold 
A : 
Lie ne 
i a 4 4 T 
_ yee 
= a 
b Fin fold length 
[mm] 92 
kk 
a | 
0.16 
* 
rs | 
0.12 qT 
0.08 -———— — = = = 
0.04 
0 
wild-type  hoxat3a7-,a13b7- — wild-type — hoxa13a7-, a13b 7+ 


72 hpft 96 hpf 


Extended Data Figure 5 | Analysis of embryonic fin fold and 
endochondral disk in hoxa13a~'~, hoxa13b—'— embryos. a, A bright 
field image of wild-type and hoxal3a~/~, hoxa13b~'~ pectoral fins at 

72 hpf. Pectoral fins were detached from the body and photographed 
(Methods). Scale bar is 150m. b, The difference in fin fold length 
between wild-type and hoxal3a ~/~, hoxa13b ~/~ embryos. The length of 
the fin fold was measured in wild-type (n = 8) and hoxal3a~'~, 

hoxa13b ~'~ double homozygous (n=5) embryos at 72 hpf and 96 hpf 
(Methods). The length of the fin folds was decreased to about 70% of wild 
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type in double homozygous embryos (72 hpf; P= 0.006, 96 hpf; P = 0.004, 
t-test comparing the means, one-tailed distribution, see Source Data). The 
error bars indicate s.e.m. c, d, Images of DAPI staining of wild-type (c) and 
hoxal13a~'~, hoxa13b~'~ mutant (d) pectoral fins captured by confocal 
microscopy. White circles indicate nuclei in the endochondral disks. 

Scale bar is 200m. e, The average number of cells in the endochondral 
disk of wild-type and hoxa13a~'~, hoxa13b~'~ mutant fins (see Methods 
and Source Data). The difference is statistically significant (P= 0.041 by 
Student’s t-test, one-tailed distribution). The error bars indicate s.e.m. 
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Extended Data Table 1 | Primers and oligos sequence for lineage tracing 


Lineage tracing oligos 


CRE_PCR_F_Ncol 

5'- CGCCCTTCCATGGATGGCCAATTTACTGACCGTAG -3' 
CRE_PCR_R_Bglll 

5’- GTTCTTCTGAAGATCTCTCTGGGGT TCGGGGCTGCAGG -3’ 
CRE_Genotype_F 

5'- CGTACTGACGGTGGGAGAAT -3' 

CRE_Genotype_R 

5’- ACCAGGCCAGGTATCTCTGA -3’ 

CRE_Probe_F 

5’- ATGGCCAATT TACTGACCGTAC -3’ 

CRE_Probe_R 

5’- CTAATCGCCATCTTCCAGCAGGCG -3’ 


Random_Oligo_Smal 


5'- CTGCTCTGGTCAGCCTCTAATGGCTCGT TAGATAGTCTAGCCGCTGGTAATCACTCGATGACCTCGGCTCCCCAT TGGTGCTACGGCGATTCT TGGAGAGCCAGCTGCGATCGCTAATGTGAGGACAGTGTAATAT TAG 
CAAGCGATAAGTCCCCAACTGGTTGTGGCCTTT TGAAAAGTGAACTTCATAACATATGCTGTCTCACGCACATGGATGGTTTGGACAAATTTGAT TCAAGTCTGATCAACCT TCACTGCTCTAGAATCAAAAGCAGTGATCTC 
CCGGGTGCGAAATAAA -3' Smal site italicized in bold 


Lo-e16_Oligo_1_BamHI_Smal 


5'- CCCCCAAAAAATGACAAAACTCTTGGAATTTATTACGGCTTTGGCAATAGAGACCGCTTTTTGGGTGGCTCAGTAAAAGGTTTGATGTTCACGTATCGCCTTTTAAATGCATTCATTCCTCTTTCATATGTGTGCAACTGTT 
TAGATACATCATAAAAATGTCACCATTGAGGTTCCCCATTAGGCATCTACCCGTTCTCCTCCAGGCCATGGAGATAAATTTGGACCAGGTGATCCCCTCCTAGAAGAGCCCTTGATGTCTTCTGGTAATGAGTTGAAAGCGGA 
AGCTGTCAGCCTTCAGCAGGCATGAAGATGCAATTAGAGCTGCGTTCAAAGTGCCCAGGCAGTCTCATAAGGAGCACTAGCCTTGGTGTAAGCTGCTTATTCACAGATCAGTTATGTAAGGGTACAGCAAAAAGGCAAGAG 
ACTCGATTTTTGAATGACACAGCAAAGTCGGTGCGGATCCCGAGTTTGCCCGGGTAGCCC -3' BamHI and Smal sites italicized in bold 


Lo-e16_Oligo_2_BamHI_Sall_Smal 


5'- CCCCCAAAAAATGACAAAAGGATCCGAATTTAT TACGGCTT TGGCAATAGAGACCGCTTTTTGGGTGGCTCAGTAAAAGGTTTGATGTTCACGTATCGCCTTTTAAATGCATTCATTCCTCTT TCATATGTGTGCAACTGTTT 
AGATACATCATAAAAATGTCACCAT TGAGGT TCCCCAT TAGGCATCTACCCGT TCTCCTCCAGGCCATGGAGATAAATT TGGACCAGGTGATCCCCTCCTAGAAGAGCCCTTGATGTCT TCTGGTAATGAGT TGAAAGCGGAAG 
CTGTCAGCCTTCAGCAGGCATGAAGATGCAAT TAGAGCTGCGTTCAAAGTGCCCAGGCAGTCTCATAAGGAGCACTAGCCT TGGTGTAAGCTGCT TAT TCACAGATCAGT TATGTAAGGG TACAGCAAAAAGGCAAGACACT 
CGATTTTTGAATGACACAGCAAAGTCGTCGACT TCTCCGAGCCCGGGAAACTAGCCC -3’ BamHI, Sall, and Smal sites italicized in bold 


Lo-e16_Oligo_3_Sall_Bglll_Smal 


5'- CCCCCAAAAAATGACGTCGACCTTGGAATTTATTACGGCTTTGGCAATAGAGACCGCTTTTTGGGTGGCTCAGTAAAAGGTTTGATGTTCACGTATCGCCTTTTAAATGCATTCATTCCTCTTTCATATGTGTGCAACTGTT 
TAGATACATCATAAAAATGTCACCATTGAGGTTCCCCATTAGGCATCTACCCGTTCTCCTCCAGGCCATGGAGATAAATTTGGACCAGGTGATCCCCTCCTAGAAGAGCCCTTGATGTCTTCTGGTAATGAGTTGAAAGCGGAA 
GCTGTCAGCCTTCAGCAGGCATGAAGATGCAATTAGAGCTGCGTTCAAAGTGCCCAGGCAGTCTCATAAGGAGCACTAGCCTTGGTGTAAGCTGCTTATTCACAGATCAGT TATGTAAGGGTACAGCAAAAAGGCAAGACAC 
TCGATTTTTGAATGACACAGCAAAGTCGAGATCTTCTCCGAGTCCCGGGAACTAGCCC -3’ Sall, Bglll, and Smal sites italicized in bold 


Lo-e16_Oligo_4_Bglll_Smal 


5'- CCCCCAAAAAATGAGATCTCTCTTGGAATTTATTACGGCTTTGGCAATAGAGACCGCTTTTTGGGTGGCTCAGTAAAAGGTTTGATGTTCACGTATCGCCTTTTAAATGCATTCATTCCTCTTTCATATGTGTGCAACTGTTT 
AGATACATCATAAAAATGTCACCATTGAGGTTCCCCAT TAGGCATCTACCCGTTCTCCTCCAGGCCATGGAGATAAATTTGGACCAGGTGATCCCCTCCTAGAAGAGCCCTTGATGTCTTCTGGTAATGAGTTGAAAGCGGAAG 
CTGTCAGCCTTCAGCAGGCATGAAGATGCAATTAGAGCTGCGTTCAAAGTGCCCAGGCAGTCTCATAAGGAGCACTAGCCTTGGTGTAAGCTGCTTATTCACAGATCAGT TATGTAAGGGTACAGCAAAAAGGCAAGACACT 
CGATTTTTGAATGACACAGCAAAGTCCCCGGGTTCTCCGAGAAACTAGCCC -3' Balll, and Smal sites italicized in bold 


Primers for final PCR to clone into destination vector: 
e16x4_F_Xho1: 
5’- CAGGCTCCCTCGAGCCCCCAAAAAATGACAAA -3° 


e16x4_R_Smal: 


5'- CGAATTCGGTCCCGGGACTTTGCTG -3’ 


Dr-CNS65_Oligo_1_BamHI_Smal 


5'- GAGGTTCACCTTTAACCACAACACGTAACAAATCAGATCTCAGAAGACAAGCCGCT TCAGAAGTCGTGCTCAGTGTTGCAT TCAAGCGTGTGTGATTT TCCAGACTGTCTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT 
GTGTGTGTGTGTGTGCTCTCAGAGATCTTTCATTGGGGAATCTT TCCTGTGTGAGAGCTGCGGTCTCAGCGGCTGATT TATGGCGCTCCGCAGCTATGCTCATGCTACGCTAACAATGCTCAT TAAAAAGAGGATGTCATCAC 
TCCGCGACACCGCAGGACTCGTATGTGTCACATGCATCCTCAATACAGCGAACCGCTGACCAATACCGTCCACAACATCCTGTAAATCTGTCATCGCCAGCATGGCCGCGGAAACACACACACACACACACACCAT TAGAGTG 
CAGTAATAGAGGATCAGAGGTTAATGTGGAGCTGTTTGCTGGTGTTTAGTTTTGTAT TAGAGGAT TTCACGTGCT TACAGCTATGTGTGTGTGTT TGAACAG TAAAGAAAGTATAAAAAGTAAAATAT TATAATCT TAAGCCACTCG 
TAATCTTCAAAAAACACTAAAATGCAAGAATAACGGATCCCTT TCACACTAGAGCCCGGGAAAGTGAGCGTT -3’ BamHI and Smal sites italicized in bold 


Dr-CNS65_Oligo_2_BamHI_Sall_Smal 


5’- GAGGTTCACCTT TAGGATCCACACGTAACAAATCAGATCTCAGAAGACAAGCCGCT TCAGAAGTCGTGCTCAGTGTTGCAT TCAAGCGTGTGTGATTT TCCAGACTGTCTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT 
GTGTGTGTGTGTGTGCTCTCAGAGATCTTTCAT TGGGGAATCTT TCCTGTGTGAGAGCTGCGGTCTCAGCGGCTGATTTATGGCGCTCCGCAGCTATGCTCATGCTACGCTAACAATGCTCAT TAAAAAGAGGATGTCATCAC 
TCCTGATTTATGGCGCTCCGCAGCTATGCTCATGCTACGCTAACAATGCTCAT TAAAAAGAGGATGTCATCACTCCGCGACACCGCAGGACTCGTATGTGTCACATGCATCCTCAATACAGCGAACCGCTGACCAATACCGTCC 
ACAACATCCTGTAAATCTGTCATCGCCAGCATGGCCGCGGAAACACACACACACACACACACCAT TAGAGTGCAGTAATAGAGGATCAGAGGT TAATGTGGAGCTGTTTGCTGGTGTT TAGTTTTGTAT TAGAGGATTTCACGT 
GCTTACAGCTATGTGTGTGTGTT TGAACAGTAAAGAAAGTATAAAAAGTAAAATAT TATAATCT TAAGCCACTCGTAATCT TCAAAAAACAC TAAAATGCAAGAATAAGTCGACCCTT TCACACTAGAGCCCGGGAAAGTGAGCGT 
T-3' BamHI, Sall, Smal sites italicized in bold 


Dr-CNS65_Oligo_3_Sall_Smal 

5’- GAGGTTCACCTT TAGTCGACACACGTAACAAATCAGATCTCAGAAGACAAGCCGCT TCAGAAGTCGTGCTCAGTGTTGCAT TCAAGCGTGTGTGAT TT TCCAGACTGTCTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT 
GTGTGTGTGTGTGTGCTCTCAGAGATCTTTCAT TGGGGAATCTT TCCTGTGTGAGAGCTGCGGTCTCAGCGGCTGATTTATGGCGCTCCGCAGCTATGCTCATGCTACGCTAACAATGCTCAT TAAAAAGAGGATGTCATCAC 
TCCGCGACACCGCAGGACTCGTATGTGTCACATGCATCCTCAATACAGCGAACCGCTGACCAATACCGTCCACAACATCCTGTAAATCTGTCATCGCCAGCATGGCCGCGGAAACACACACACACACACACACCAT TAGAGTG 
CAGTAATAGAGGATCAGAGGTTAATGTGGAGCTGTT TGCTGGTGTTTAGTTTTGTAT TAGAGGATT TCACGTGCT TACAGCTATGTGTGTGTGTT TGAACAGTAAAGAAAGTATAAAAAGTAAAATAT TATAATCT TAAGCCACTC 
GTAATCTTCAAAAAACACTAAAATGCAAGAATAACCCTTTCACACTAGAGCCCGGGAAAGTGAGCGTT -3’ Sall and Smal sites italicized in bold 

Primers for final PCR to clone into destination vector: 

CNS65x3_F_Xhol: 

5’- GCAGGCTCCTCGAGGAGGTTCACCTTTAACCA -3’ 

CNS54x3_R_Smal: 

5'- AACGCTCACTTTCCCGGGTCTAGTGT -3’ 


PCR primers and oligos for construction of lineage tracing vectors are listed (See Methods). Restriction enzyme sites that were used for ligating oligos are highlighted in italics 
and bold in oligo sequence. 
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Extended Data Table 2 | PCR primers for CRISPR/Cas9 deletion, T7E1 assay, genotypes and gene cloning 


CRISPR gRNA oligos 
zebra hoxa13a_gRNA1_F 


5’- AATTAATACGACTCACTATAGGGCAATCACAACCAGTGGAGTTT TAGAGCTAGAAATAGC -3’ 
zebra hoxa13a_gRNA2_F 
5’- AATTAATACGACTCACTATAGGCAGTAAAGACTCATGTCGGTTT TAGAGCTAGAAATAGC -3’ 
zebra hoxa13b_gRNA1_F 


5’- AATTAATACGACTCACTATAGGATGATATGAGCAAAAACAGTT TTAGAGCTAGAAATAGC -3’ 
zebra hoxa13b_gRNA2_F 
5’- AATTAATACGACTCACTATAGGACACTTCTGTTTCTGGAGGTTT TAGAGCTAGAAATAGC -3’ 
zebra hoxd13a_gRNA1_F 
5’- AATTAATACGACTCACTATAGGCTCTGGCTCCTTCACGTTGTTT TAGAGCTAGAAATAGC -3’ 
zebra hoxd13a_gRNA2_F 


5’- AATTAATACGACTCACTATAGGCGAACTCTTTAAGCCAGCGTTTTAGAGCTAGAAATAGC -3’ 
zebra gRNA_R 


5’- AAAAGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCT TATTT TAACTTGCTATT TCTAGCTCTAAAAC -3’ 


T7 assay primers Genotype primers for single (hoxa13a or a13b) and double (hoxa13a, a13b) mutants 
zebra hoxa13a_Cont_F zebra hoxa13a_8 bp del_F 
5’- CTGCAGCGGGTGATTCTG -3’ 5’- GCCAAGGAGTTTGCCTTGTA -3’ 
zebra hoxa13a_Cont_R zebra hoxa13a_8 bp del_R 
5’- CTCCTTTACCCGTCGGTTTT -3’ 5'- TGACGACTTCCACACGTTTC -3’ 
PCR product: 810 bp PCR product: wild-type 231 bp, mutant (cut by Ava1) 111 +119 bp 
zebra hoxa13b_Cont_F zebra hoxa13a_29 bp del_F 
5’- GAAGCTTATCACTAGAATCTTTACAGC -3’ 5’- CAGGCAATAAGCGGGCCTT -3’ 
zebra hoxa13b_Cont_R zebra hoxa13a_29 bp del_R 
5'- TTTTTCTCAGGGCCTAAAGGT -3’ 5’- GTGCAGTAGACCTGTCCGTT -3’ 
PCR product:1089 bp PCR product: wild-type 110 bp, mutant 81 bp 
zebra hoxd13a_Cont_F zebra hoxa13b_14 bp ins_F 
5'- AGCTGCCCAATCACATGC -3’ 5'- GATTGACCCGGTGATGTTTC -3’ 
zebra hoxd13a_Cont_R zebra hoxa13b_14 bp ins_R 
5'- CGATTATAAATTCAGTTGCTCTTTAG -3’ 5’- TACACTGGTTCGCAGCAAAA -3' 
PCR product: 823 bp PCR product: wild-type 98 bp, mutant (cut by Bcc1) 53 + 57 bp 


Cloning primers 
Danio_and1_F 
5’-ACCTGCTCCTGCTCCAGTTA -3’ 
Danio_and1_R 
5’- CACATCCTCTTGAGGGGAAA -3’ 


For synthesis of gRNAs, each forward primer and common reverse primer (‘zebra gRNA_R’) were hybridized and used as templates. For genotype of single and double mutants, PCR products 
were treated by the enzymes indicated. 
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Extended Data Table 3 | List of hox13 mutant sequences 


a. 
hoxa13a 
TCCAGGCAATAAGCGGGCCTTGGGAGCCCCGACATGAGTCTTTACTGCCGATGGAGAGTTACCAACCGTGGGCAATCACAACCAGTGGATGGAAC wild-type 
TCCAGGCAATAAGCGGGCCTTCGGAGC -- - - -++++-+-GATGGAGAGTTACCAACCGTGGGCAATCACAACCAG -- -- -- - AAC -22 -7 = 29 bp deletion 
TCCAGGCAATAAGCCCGCCTTCGGAGCCCCGA -- - - - GTCTTTACTGCCGATGGAGAGTTACCAACCGTGGGCAATCACAA -- - GTGGATGGAAC -5 -3 = 8 bp deletion 
TCCAGGCAATAAGCCCGCCTTGGGAGCCCC --- -- GAGTCTTTACTGCCGATGGAGAGTTACCAACCGTGGGCAATCACAACCAG|TITGGATGGAAC = -5 + 1 = 4 bp deletion 
b. 
hoxa13b 
CTATGACAACGGTTTGGATGATATGAGCAAAAACATGGAAGG-—~-------— ae TACATGGACACTTCTGTTTCTGGAGAGGAGT wild-type 
CTATGACAACGGTTTGGATGATATGAGCAAAT[GGAAGGATGGAGC]ACATGGAAGG---------- TACATGGACACTTCTGTTTCTGGAGAGGAGT (+1bp ins.) 14 bp insertion 
CTATGACAACGGTTTGGATGATATGAGCAA - -- - ATGGAAGG--------- a TACATGGACACTTCTGTTTCTGGAGAGGAGT 4 bp deletion 
c. 
hoxd13a 
ATCCAATATGGCTCTGGCTCCTTCACGTITGGATGCCATT CGGTGAAGCCTCCAGCTGGCTTAAAGAGTTCGCCTTTTATCAAGG wild-type 
ATCCAATATGGCTCTGGCTCCI[AAAAAAA]TTGCCGTTTGGATGCCATT--------------------- CGGTGAAGCCTCCAG[CCTICCAGCTTAAAGAGTTCGCCTITTATCAAGG 7 +3= 10 bp insertion 
ATCCAATATGGCTCTGGCTCCTTCAAAT[GGCTC]TTGGATGCCATT =n CGGTGAAGCCTCCAGCTGGCTTAAAGAGTTTGCCTTTTATCAAGG 5bp insertion 
ATCCAATATGGCTCTGGCTCC- - - ---TTTGGATGCCATT---------------- CGGTGAA -- - ---- - - -- GCTTAAAGAGTTCGCCTTTTATCAAGG -11 -6 = 17 bp deletion 


d. 
hoxa13a 
TCCAGGCAATAAGCGGGCCTTGGGAGCCCCGACATGAGTCTTTACTGCCGATGGAGAGTTACCAACCGTGGGCAATCACAACCAGTGGATGGAAC wild-type 0% 
TCCAGGCAATAAGCCCGCCTTCGGAGCCCCGA - -- - - GTCTTTACTGCCGATGGAGAGTTACCAACCGTGGGCAATCACAA - - - GTGGATGGAAC -5 -3 = 8 bp deletion 100% 
e. 
hoxa13b 
CTATGACAACGGTTTGGATGATATGAGCAAAAACATGGAAGG-~---n---a--nnon anno TACATGGACACTICTGTTTCTGGAGAGGAGT wild-type 0% 
CTATGACAACGGTTTGGATGATATGAGCAAAT[GGAAGGATGGAGC]ACATGGAAGG- TACATGGACACTICTGTTTCTGGAGAGGAGT (+1bpins.) 14 bp insertion 50% 
CTATGACAACGGTTTGGATGATATGAGC - AATTTTTT GAAGG-—-—---------—-=-----————— === TACATGGACACTICTGTTICT - - -- -- GAGT -1-6=7 bp deletion 50% 
f. 
hoxd13a 
ATCCAATATGGCTCTGGCTCCTTCACGTTTGGATGCCOA TT -m-nn-=--nn-n-eeeno= CGGTGAAGCCTCCAGCTGGCTTAAAGAGTICGCCTTTTATCAAGG wild-type 0% 
ATCCAATATGGCTCTGGCTCCTI - - - --- TEGATGCCATT-—-------------------- CGGTGAAGCCTCCAGCTGGCTTAAAGAGTTTGCCTTTTATCAAGG 6bp deletion 14.3% 
ATCCAATATGGCTCTGGCTCCTICT|GGCT|CGTTTGGATGCCATT----------- CGGTGAAGCCTCCAGCTGGCTTAAAGAGTTTGCCTTTTATCAAGG 4bp inserttion 28.5% 
ATCCAATATGGCTCTGGCTCCTICAC[TCCAAATGIGTITGGATGCCATT--- CGGTGAAGCCTCCAGCTGGCTTAAAGAGTTCGCCTTTTATCAAGG 8bp inserttion 14.3% 
we tee eee eee nee 2 eee oe = CGTTTGGATGCCA TT onnreennnseeeeenne--+ CGGTGAAGCCTCCAGCTGGCTTAAAGAGTICGCCTITTATCAAGG 170bp deletion 14.3% 
ATCCAATATGGCTCTGGCTCCTICG[GCTICGTITGGATGCCATT-------------- CGGTGAAGCCTCCAGCTGGCTTAAAGAGTICGCCTITTATCAAGG 3bp insertion 14.3% 
ATCCAATATGGCTCTGGCTCCTICAC[TCCAAATGIGTITGGATGCCATT--- CGGTGAAGCCTCCA -- - - GCTTAAAGAGTTCGCCTTTTATCAAGG +8-4=4bpinsertion 14.3% 
9- Figure 3, Extended Data Fig.4a-p (4 months adult) 
hoxai3a hoxa13b hoxd13a 
hoxa13a -/- 8 bp del. / 29 bp del. 
hoxa13b -/- 4 bp del. / 14 bp ins. 
hoxd13a -/- 5bp ins. / 10 bp ins. 
double homo 8 bp del. / 29 bp del. 14 bp ins./ 14 bp ins. 
Triple KO1 0% 25% 50% 
Triple KO2 0% 0% 0% 
Extended Data Fig.3, 5 (embryo), Extended Data Fig.4q (adults for radial count) 
hoxa13a hoxa13b hoxd13a 
hoxai3a -/- 4 bp del. / 4 bp del. 
hoxa13b -/- 4 bp del. / 14 bp ins. 
hoxd13a -/- 5bp ins. / 17 bp del. 


double homo 8 bp del. / 29 bp del. 14 bp ins./ 14 bp ins. 


LETTER 


Frame-shift mutation alleles that were used for each experiment are listed. The top sequence in each column show wild type with gRNA sequence in red. Green is insertional and blue is substitutional 
mutations. a, hoxal3a mutation patterns. Three types of mutations were used in this paper. Horizontal bars are deletional mutations. b, hoxa13b mutation patterns. Sequences flanked by two 

gRNAs are abbreviated by black horizontal bars. Additional 1 bp is inserted at 3’ side of the gRNA target side in ‘14 bp insertion’. c, hoxd13a mutation patterns. Sequences flanked by two gRNAs are 
abbreviated by black horizontal bars. d-f, Mutational patterns in a triple knockout (mosaic for hoxa1 3b and hoxd1 3a) fish that is shown in Fig. 3p-r are listed. Sequence flanked by two gRNAs are 
abbreviated by horizontal bars in e and f. Each hox13 gene shows some different mutations indicating that this fish is highly mosaic. The percentage of mutant alleles was calculated from the result of 
deep sequencing (Fig. 3 and Extended Data Table 4). g, Summary of genotype in all experiments. del. (deletion), ins. (insertion). 
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Extended Data Table 4 | Genotyping of progeny from mutant crosses 


|- |- 
hoxa13a i‘ x hoxa13a . 


+/+ +/- -/- Total 
Embryos (72 hpf) 9 (25.0%) 17(47.2%) 10(27.8%) 36 
Adult 9 (21.4%)  20(47.6%) 13 (31%) 42 
hoxat3b "x hoxat3b * 
+/+ +/- -/- Total 
Embryos (72 hpf) 8 (25.0%)  20(62.5%) 12 (37.5%) 32 
Adult 20 (32.3%) 32(51.6%) 10 (16.1%) 62 
+/- +/- 
hoxd13a x hoxd13a 
+/+ +/- -/- Total 
Embryos (72 hpf) 8 (22.9%)  18(51.4%) 9 (25.7%) 35 
Adult 5 (26.3%)  11(57.9%) 3 (15.8%) 19 
c 
total adulf fish short finned fish % 
Negative control: Cas9 only 96 (0) 0.00 
Cas9,hoxa13b and d13a gRNAs 161 7 4.35 


| | |. - 
hoxa13a™", al3b" Xx hoxa13a sah atab™ (72hpf) 
ao +/+ +/- -/- Total 
a13a 
+/+ 20 (11.0%) 25(13.7%) 10 (5.5%) 55 (30.2%) 
+/- 23 (12.6%) 50 (27.5%) 10 (5.5%)  83(45.6%) 
-/- 6 (3.3%)  28(15.4%) 10(5.5%) 44 (24.2%) 
Total 49 (26.9%) 103 (56.6%) 30 (16.5%) 182 (100.0%) 
| | | | 
hoxa13a™", al3b" x hoxal3a™", al3b™ (Adult) 
a13b . eft 
al3a +/+ 4+/- /: Total 
+/+ 4 8 ) 12 (22.2%) 
+/- 4 18 5 27 (50.0%) 
-/- 3 5 7 15 (27.8%) 
Total 11(20.4%)  31(57.4%) 12(22.2%) 54 (100.0%) 
d 
Genotype of short fin fish (The percent of normal alleles are shown) 
cal #2 #3 #4, #5 #6 #7 
hoxa13a 20% 50% 0% 0% 25% 25% 0% 
hoxa13b 20% 0% 25% 0% 0% O0% 0% 
hoxd13a 100% 67% 50% 25% 30% 100% 0% 


a, Breeding data in hox13 single mutants. Single heterozygous fish were crossed with each other to obtain embryos and next generations. Embryos (72 hpf) or adult fish (3 months) were genotyped by 
T7E1 assay and sequenced. The number of each genotype and percentages are shown. The ratio of each genotype approximately follows Mendelian ratio. b, Breeding data for double hoxa13 mutants. 
Double heterozygous fish (hoxal 3a '/~, hoxa13b */~) were crossed to obtain embryos and next generations. Embryos (72 hpf) or adult fish (three months) were genotyped by PCR followed by enzyme 
digestion (Methods) or sequencing. The number of each genotype and percentage are shown. The ratio of each genotype approximately follows Mendelian ratio. c, The efficiency of triple knockout 
(mosaic for hoxa13a, hoxa13b and hoxd13a) in zebrafish (See Methods). The number of normal adult fish and short-finned fish from negative control injection (Cas9 MRNA without gRNAs) or triple 
knockout injection (Cas9 mRNA with gRNAs) are shown. Genotypes for short-finned fish were calculated from deep sequencing of each allele and shown as a percentage of normal alleles in d. 
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Chagas disease, leishmaniasis and sleeping sickness affect 20 million 
people worldwide and lead to more than 50,000 deaths annually!. 
The diseases are caused by infection with the kinetoplastid parasites 
Trypanosoma cruzi, Leishmania spp. and Trypanosoma brucei spp., 
respectively. These parasites have similar biology and genomic 
sequence, suggesting that all three diseases could be cured with 
drugs that modulate the activity of a conserved parasite target”. 
However, no such molecular targets or broad spectrum drugs have 
been identified to date. Here we describe a selective inhibitor of the 
kinetoplastid proteasome (GNF6702) with unprecedented in vivo 
efficacy, which cleared parasites from mice in all three models of 
infection. GNF6702 inhibits the kinetoplastid proteasome through 
a non-competitive mechanism, does not inhibit the mammalian 
proteasome or growth of mammalian cells, and is well-tolerated 
in mice. Our data provide genetic and chemical validation of the 
parasite proteasome as a promising therapeutic target for treatment 
of kinetoplastid infections, and underscore the possibility of 
developing a single class of drugs for these neglected diseases. 
Kinetoplastid infections affect predominantly poor communities 
in Latin America, Asia and Africa. Available therapies suffer from 
multiple shortcomings, and new drug discovery for these diseases is 
limited by insufficient investment*. We sought low molecular weight 
compounds with a growth inhibitory effect on Leishmania donovani’, 
Trypanosoma cruzi®’ and Trypanosoma brucei>*®. Our approach was to 
test 3 million compounds in proliferation assays on all three parasites 
(Supplementary Information Tables 1-3), followed by triaging of active 
compounds (half-maximum inhibitory concentration value ECs9 
< 101M) to select those with a clear window of selectivity (> fivefold) 
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GNF5343 (hit from screen) GNF2636 

L.donovaniEC,, = 7.3+0.6 uM L. donovaniEC,, =350+7.1nM 
T. brucei EC. =150+7.5nM T. brucei EC. = 79+2.9nM 
T. cruzi EC, = 75+0.8nM T. cruzi EC. = 55+12nM 
3T3 CCgy = 17+0.9M 3T3CC,, = 9.0+0.90M 
Macrophage CC,, > 50uM Macrophage CC,, = 14+2.14M 
F < 5% F = ND 

CL = ND CL = ND 


Figure 1 | Chemical evolution of GNF6702 from the phenotypic hit 
GNF5343. Leishmania donovani, amastigotes proliferating within primary 
mouse macrophages; T. brucei, the bloodstream form trypomastigotes; 

T. cruzi, amastigotes proliferating in 3T3 fibroblast cells; macrophage, 
mouse primary peritoneal macrophages; ECs9 and CCso, half-maximum 


with respect to growth inhibition of mammalian cells. An azabenzoxazole, 
GNEF5343, was identified as a hit in the L. donovani and T. brucei 
screens. Although GNF5343 was not identified in the T. cruzi screen, 
we noted potent anti-T: cruzi activity of this compound in secondary 
assays. 

Optimization of GNF5343 involved the design and synthesis of 
~3,000 compounds, and focused on improving bioavailability and 
potency on inhibition of L. donovani growth within macrophages 
(Fig. 1). A critical modification involved replacement of the 
azabenzoxazole centre with C6-substituted imidazo- and triazolopy- 
rimidine cores, which yielded compounds up to 20-fold more potent on 
intra-macrophage L. donovani (for example, GNF2636). Replacement 
of the furan group with a dimethyloxazole ring reduced the risk of 
toxicity associated with the furan moiety, and replacement of the 
chlorophenyl group with a fluorophenyl improved selectivity over 
mammalian cell growth inhibition (for example, GNF3849). These 
changes also resulted in low clearance and acceptable bioavailability. 
Further substitutions at the core C6 position led to GNF6702 and a 
400-fold increase in intra-macrophage L. donovani potency compared 
to GNF5343. 

Leishmania donovani parasites cause a majority of visceral leish- 
maniasis (VL) cases in East Africa and India’. In mice infected 
with L. donovani", oral dosing with GNF6702 effected a more 
pronounced reduction in liver parasite burden than miltefosine, the 
only oral anti-leishmanial drug available in clinical practice? (Fig. 2a). 
The miltefosine regimen for VL efficacy studies was chosen to 
approximate the drug plasma concentration of the clinical regimen’. 
We noted a greater than three-log reduction in parasite load after 
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L. donovaniEC,, = 71+3.0nM L. donovaniEC,, = 18+1.8nM 
T. brucei EC, = 22+2.0nM T. brucei EC, = 70+1.5nM 
T. cruzi EC59 = 16+0.9nM T. cruzi EC55 = 120 + 2.6nM 
3T3 COs, = 5.041.3"M  3T3CC,y > 20uM 
Macrophage CC,, = 9.3+1.7 1M Macrophage CC,, > 50uM 

F = 34% Fo = 34% 

CL = 2.5 ml min‘ kg" CL = 2.0 ml min kg* 


growth-inhibition concentration; F, oral bioavailability in mouse after 
administering single compound dose (20 mgkg') as a suspension; CL, 
plasma clearance in mouse after single i.v. bolus dose (5mgkg~'); ND, 
not determined; all EC59 and CCs9 values correspond to means + s.e.m. 
(n= 4 technical replicates). 
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infections. a, Post-treatment L. donovani liver burdens in mouse model 
of VL as assessed by qPCR (n =5 mice). b, PK/PD relationship for ten 
GNF6702 analogues, each administered at several doses; circles, mean 
liver burdens associated with individual compound regimens (30 regimens 
in total; n =5 mice per regimen) relative to vehicle; horizontal dotted 

line, 90% reduction in the liver L. donovani burden; vertical dotted line, 
0.94-fold multiple of the mean free compound plasma concentration/ the 
L. donovani ECogp value ratio. c, Post-treatment L. major footpad burdens 
in the BALB/c mouse model of CL as assessed by qPCR (n = 6 mice); the 
P values (two-tailed distribution) relate parasite burdens in compound- 
treated mice with those from vehicle-treated mice; left inset picture, 

a representative mouse footpad after treatment with vehicle; right inset 
picture, a representative mouse footpad after treatment with GNF6702 

10 mgkg ! twice-daily regimen. d, T. cruzi burden in mouse blood 
(circles), colon (triangles) and heart (diamonds) as assessed by qPCR after 


eight-day treatment with 10 mgkg~! of GNF6702 twice-daily with the 
free concentration of GNF6702 (fraction unbound in plasma = 0.063) 
staying above the L. donovani ECog9 value (the concentration inhibiting 
99% of intra-macrophage parasite growth in vitro) for the duration of 
the dosing period (Extended Data Fig. 1a). Characterization of efficacy 
of ten analogues in the series at various doses revealed a significant 
correlation (r? =0.89, P< 0.01) between (i) the ratio of mean free 
plasma compound concentration to the L. donovani ECoo value and 
(ii) reduction of the liver parasite burden. We found that 90% parasite 
burden reduction in the mouse model was achieved when the mean 
free compound plasma concentration during treatment equalled a 
0.94-fold multiple of the L. donovani ECs value (Fig. 2b). 

Cutaneous leishmaniasis (CL) affects about a million people per year, 
causing skin lesions that can resolve into scar tissue’. In parts of the 
Middle East, CL has reached epidemic proportions’. After footpad 
infection of BALB/c mice with the dermatotropic L. major strain'+}, 
treatment with GNF6702 at 10 mgkg™! twice-daily caused a fivefold 
decrease in footpad parasite burden and a reduction in footpad swelling 
(Fig. 2c). Both 3mgkg~' and 10 mgkg! twice-daily regimens of 
GNF6702 were superior to 30mgkg~' once-daily miltefosine regimen 
(P <0.01), which translates into an approximately twofold higher 
miltefosine plasma concentration in mice than observed in clinical 
dosing!. 

We further tested if GNF6702 can cure additional kinetoplastid 
parasite infections. An estimated 25% of the 8 million people infected 
with T. cruzi will develop chronic Chagas disease, manifesting as 
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e, Whole-body in vivo imaging of bioluminescent T. brucei before and after 
treatment; T. brucei-infected mice were treated by a single intraperitoneal 
injection of diminazene aceturate (n = 3 mice) or by oral administration of 
GNF6702 once-daily for 7 days (n = 6 mice); filled symbols show whole- 
body bioluminescence values for individual mice; several mice from the 
untreated and diminazene aceturate-treated groups were euthanized 
between days 28 and 56 due to CNS infection symptoms; background 
bioluminescence values shown for uninfected mice (grey-filled squares; 
n=4) were collected independently from mice aged-matched for day 0 
using the same acquisition settings. Red dotted lines in a, c and d show the 
limit of parasite detection by qPCR; plot symbols below the red dotted line, 
mice with no detectable parasites; data points below the limit of detection 
are ‘jittered’ to show number of animals in a group; thick horizontal lines, 
means of the treatment groups; RU, relative units (parasite burden relative 
to the mean burden of the vehicle-treated group). 


cardiac or intestinal dysfunction'®!”. Benznidazole is broadly used for 
treatment of acute and indeterminate stages of Chagas disease in Latin 
America!®!°, However, benznidazole has side effects that frequently 
lead to treatment interruption'*”°-* and a better tolerated drug is 
needed. To model treatment in the indeterminate disease stage, we 
infected mice with T. cruzi parasites and began treatment 35 days 
after infection, when the immune system of the mice had controlled 
parasite burden”’. We increased the parasite detection sensitivity by 
immunosuppressing the mice after 20 days of treatment”**, In this 
model, GNF6702 dosed twice-daily at 10 mgkg~! matched the efficacy 
of benznidazole at 100 mg kg! once-daily; all but one of the treated 
mice had no detectable parasites in blood, colon or heart tissue, even 
after 4 weeks of immunosuppression (Fig. 2d). 

Finally, we tested GNF6702 in a mouse model of stage II sleeping 
sickness (human African trypanosomiasis (HAT))**. Mortality of stage II 
HAT is caused by infection of the CNS and, in this mouse model, 
luciferase-expressing T: brucei parasites establish a CNS infection by 
day 21 post-infection. GNF6702 was administered at 100 mgkg™! once- 
daily to account for low exposure in the brain relative to the plasma 
(~10%, Extended Data Fig. 1b). Diminazene aceturate, a stage I 
drug that poorly crosses the blood-brain barrier, effected apparent 
clearance of parasites from the blood after a single dose, but did not 
prevent parasite recrudescence 21 days later. By contrast, treatment 
with GNF6702 for seven days caused a sustained clearance of parasites 
(days 42 and 92 post-infection in Fig. 2e, Extended Data Fig. 2a, 
Supplementary Information Tables 4 and 5). Importantly, mice treated 
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Figure 3 | F24L mutation in proteasome 34 subunit confers selective 
resistance to GNF6702. a, Growth inhibition of T: cruzi epimastigote 
strains ectopically expressing wild-type PSMB4 or PSMB4'*" protein 
by GNF6702 and bortezomib; non-induced/induced, culture medium 
without/with tetracycline to modulate expression of tetracycline- 
inducible PSMB4 genes. b, Growth inhibition of T. brucei bloodstream 
form trypomastigotes constitutively overexpressing wild-type PSMB4 
or PSMB4?4" protein by GNF6702 and bortezomib. ECso values for 


with GNF6702 had no detectable parasites in the brain at termination 
of the experiment, though parasites were clearly detected in the brains 
of mice treated with diminazene aceturate (Extended Data Fig. 2b, 
Supplementary Information Table 6). 

As GNF6702 showed compelling efficacy in four mouse models of 
kinetoplastid infections: VL, CL, Chagas disease and stage II HAT, 
we reasoned that mechanistic studies of GNF6702 might identify a 
pan-kinetoplastid drug target that could inform target-based drug 
discovery efforts. We attempted to evolve L. donovani strains resistant 
to GNF3943 and GNF8000 (early analogues from the series, Extended 
Data Fig. 3) through 12 months of parasite culture under drug pressure 
without success. However, we were able to select two drug-resistant 
T. cruzi epimastigote isolates, one resistant to GNF3943, and another 
to GNF8000. Both T. cruzi lines exhibited at least 40-fold lower 
susceptibility to GNF6702 than wild-type T. cruzi (Extended Data 
Fig. 4a, b). Using whole-genome sequencing, we found that the 
GNF3943-resistant line had a homozygous mutation encoding 
a substitution of isoleucine for methionine at amino acid 29 in the 
proteasome 34 subunit (PSMB4//?) and a heterozygous mutation 
P82L in dynein heavy chain gene. The GNF8000-resistant line had a 
heterozygous F24L mutation in PSMB4, and four other heterozygous 
mutations (Extended Data Table 1). We focused our attention on the 
proteasome as a likely target for the compound series because we 
found two independent mutations in the PSMB4 gene, and because 
the proteasome is an essential enzyme in eukaryotic cells. We also 
note that the Plasmodium falciparum proteasome has recently been 
the target of promising drug discovery efforts for malaria”. 

We first asked whether two prototypic inhibitors of mammalian 
proteasome, bortezomib and MG132, could also block T. cruzi growth. 
Indeed, both compounds inhibited T’ cruzi epimastigote proliferation 
with sub-micromolar potency. However, in contrast to GNF6702, 
bortezomib and MG132 inhibited proliferation of the two resistant lines 
(PSMB49M/29M_ ps MB4"F24L) with comparable potency to the wild- 
type parasites. Additionally, the PSMB4 mutant lines were not resistant 
to nifurtimox, an anti-kinetoplastid drug with an unrelated mechanism 
of action (Extended Data Fig. 4a, b). To determine whether the F24L 
mutation was sufficient to confer resistance to GNF6702, we engineered 
T. cruzi epimastigote lines that ectopically expressed either wild-type 
or F24L-mutated PSMB4. Overexpression of wild-type PSMB4 had 
little effect on the ECs» value for GNF6702, whereas overexpression 
of PSMB4"*" caused a greater than tenfold reduction in GNF6702 
potency, but not in that of bortezomib (Fig. 3a, Extended Data 


each strain/compound pair are listed inside a and b plot panels next to 
corresponding strain/compound symbol (defined in plot legends); means 
from n=3 technical replicates are shown; error bars represent s.e.m. 
values; for data points lacking error bars, s.e.m. values are smaller than 
circles representing means; owing to limited aqueous solubility, the highest 
tested GNF6702 concentration was 10M. RU (relative units) in a and 

b corresponds to parasite growth relative to the DMSO control (%). 


Fig. 4c). Previously, bortezomib was also shown to inhibit the growth 
of T. brucei, suggesting that proteasome activity is essential for growth 
in this parasite as well?’. To test whether PSMB4?"4" can rescue 
growth inhibition by GNF6702 in T. brucei, we engineered two parasite 
strains that ectopically expressed wild-type and F24L-mutated PSMB4, 
respectively. Similar to T: cruzi, overexpression of PSMB4?*" in 
T. brucei conferred a high level of resistance to GNF6702 (~70-fold 
shift in ECs9 value), while having no effect on parasite susceptibility to 
bortezomib (Fig. 3b, Extended Data Fig. 4c). 

We next asked whether GNF6702 could inhibit any of three T. cruzi 
proteasome proteolytic activities in biochemical assays. As predicted 
from the T. cruzi genome’’, mass spectrometry analysis of purified 
T. cruzi proteasome identified seven alpha and seven beta proteasome 
subunits, including PSMB4 (Supplementary Tables 7 and 8). Using 
substrates that are specific for each of the chymotrypsin-like, trypsin- 
like and caspase-like proteolytic activities, we found that only the 
chymotrypsin-like activity of the T: cruzi proteasome was inhibited 
by GNF6702 (ICs9 =35nM), while the other two activities were 
not affected (ICs) > 101M). In contrast, bortezomib inhibited the 
chymotrypsin-like (ICs) =91nM), the caspase-like (ICs) = 370 nM) 
and the trypsin-like (IC59 = 1.7 1M) activities. We further found that 
the chymotrypsin-like activity of the PSMB4”™ T. cruzi proteasome 
was at least 300-fold less susceptible to GNF6702 (IC59 > 10,1M) and 
~3-fold less susceptible to bortezomib (ICs) = 0.26 1M), while suscep- 
tibility of the other two mutant proteasome proteolytic activities to 
the two inhibitors were not affected (Fig. 4a, Extended Data Table 2). 

We reasoned that if the primary mechanism of parasite growth 
inhibition by the compound series was through inhibition of the 
proteasome chymotrypsin-like activity, then the ICso values for this 
proteolytic activity should correlate with ECs values for parasite 
proliferation. Indeed, a tight correlation between the two parameters 
was observed for L. donovani axenic amastigotes and T. brucei blood- 
stream form trypomastigotes (r? = 0.78 and r? = 0.67, respectively) 
over a 2,000-fold potency range for 317 analogues, thus indicating that 
inhibition of parasite proteasome activity was driving the anti-parasitic 
activity of these compounds. We observed a weaker correlation between 
ICs and ECs values for intracellular T.cruzi (r? = 0.36, P< 0.01), 
perhaps reflecting more complex cellular pharmacokinetics resulting 
from compounds having to access T. cruzi parasites within the cytosol 
of mammalian cells (Fig. 4b, Extended Data Fig. 5). 

Both resistant T. cruzi lines retained sensitivity to bortezomib, which 
is a substrate-competitive inhibitor, suggesting that GNF6702 might 
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Figure 4 | Compounds from GNF6702 series inhibit growth 

of kinetoplastid parasites by inhibiting parasite proteasome 
chymotrypsin-like activity. a, Inhibition of three proteolytic activities 
of purified wild-type (PSMB4™") and PSMB4?™ T: cruzi proteasomes 
by GNF6702 and bortezomib; ICs» values for proteasome proteolytic 
activities are listed inside plots. b, Correlation between inhibition of 
chymotrypsin-like activity of purified L. donovani proteasome (ICso) 
and L. donovani axenic amastigote growth inhibition (EC50; data 
points correspond to means of 2 technical replicates); red circles, 

ICso > 201M; blue circles, ECsp > 25 1M; yellow circles, [C59 > 201M 


have an alternative mode of inhibition. A Lineweaver—Burk plot of 
chymotrypsin-like activity at increasing concentrations of peptide 
substrate showed that GNF6702 has a non-competitive mode of 
inhibition clearly distinct from the competitive mechanism described 
for MG132 and bortezomib”**°. We were also able to extend these 
observations to proteasome from L. donovani (Fig. 4c, Extended Data 
Table 3). We further note that GNF6702 had no measurable activity on 
the human proteasome (Fig. 4d, Extended Data Table 2). Interestingly, 
human proteasome (4 subunit has a methionine at the 29th amino 
acid position, mirroring the 129M mutation in the GNF3943-resistant 
T. cruzi line (Extended Data Fig. 6a). 

In summary, GNF6702 blocks the chymotrypsin-like activity 
harboured by the 85 subunit without competing with substrate binding, 
and mutations in the $4 subunit, which is in direct physical contact 
with the 85 subunit, confer resistance to this inhibition. Next we used 
homology modelling of the T. cruzi proteasome to look for evidence of 
an allosteric inhibitor binding site. In the T. cruzi proteasome model, 
the F24 and 129 34 residues are positioned at the interface between the 
84 and 35 subunits, on the outer limit of the 85 active site. Adjacent to 
these two (34 residues and the 85 active site is a plausible binding pocket 
for GNF6702 (Extended Data Fig. 6b, c). 

Finally, we tested whether GNF6702 can inhibit proteasome activity 
in intact T: cruzi cells. Cellular proteins entering the proteasome 
degradation pathway are first tagged with ubiquitin, and proteasome 
inhibition results in intracellular accumulation of ubiquitylated 
proteins. Treatment of T. cruzi epimastigotes with GNF6702 led to sub- 
stantial build-up of ubiquitylated proteins (Extended Data Fig. 7a) with 
the half-maximal effect (ECs9) achieved at 130 nM compound concen- 
tration (Extended Data Fig. 7c). This ECs value correlated well with the 
half-maximal growth inhibitory concentration of GNF6702 on T. cruzi 
epimastigotes (ECs 9 = 150nM; Extended Data Fig. 4b). For comparison, 
similar experiments with bortezomib yielded comparable inhibitor 
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and ECs > 25 1M; data for 317 analogues are shown. c, Lineweaver-Burk 
plot of inhibition of T. cruzi proteasome chymotrypsin-like activity by 
GNF6702 at increasing concentrations of a peptide substrate. d, Effect 

of GNF6702 and bortezomib on three proteolytic activities of human 
constitutive proteasome; ICs9 values for proteasome proteolytic activities 
are listed inside plots. Data shown in a, c and d represent means + s.e.m. 
(n= 3 technical replicates; for data points lacking error bars, s.e.m. values 
are smaller than circles representing means). Owing to limited aqueous 
solubility, the highest tested GNF6702 concentration in experiments 
shown in a and d was 10M. 


potencies in the two T. cruzi assays (ubiquitylation ECs) = 62 nM versus 
growth inhibition ECs) = 160 nM; Extended Data Figs 4b and 7c). We 
did not observe any detectable accumulation of ubiquitylated proteins in 
mammalian 3T3 cells treated with GNF6702 (Extended Data Fig. 7b, c), 
further confirming the high selectivity of this compound. 

Validation of the parasite proteasome as the target of GNF6702 is 
supported through several lines of evidence: (i) point mutations in 
the PSMB4 gene are sufficient to confer resistance to biochemical 
proteasome inhibition and cellular T. cruzi growth inhibition; 
(ii) GNF6702 is a selective inhibitor of parasite proteasome activity 
and does not inhibit the human proteasome, mirroring the selective 
inhibition of parasite growth over mammalian cell growth; and 
(iii) potency of GNF6702 and analogues in parasite proteasome assays 
predict potency in parasite growth-inhibition assays. 

In this work we show that in mouse disease models, GNF6702 
was able to eradicate parasites from diverse niches that included the 
cytosol (T. cruzi), phagolysosome (L. donovani, L. major) of infected 
host cells, and brain (T: brucei). GNF6702 has also good pharmacoki- 
netic properties, and the compound did not show activity in panels of 
human receptor, enzyme and ion channel assays (Supplementary Tables 
9-11). Going forward, GNF6702, or analogues thereof, has potential 
to yield a new treatment for several kinetoplastid infections and it is 
currently being evaluated in preclinical toxicity studies. It is unclear if 
the clinical utility of GNF6702 could extend to the treatment of stage 
II HAT as GNF6702 was tested in the HAT mouse model only at one 
high dose (100 mgkg~! once-daily). We also note that identification 
of a broadly active pan-kinetoplastid drug might not be feasible 
(or desirable) as such a drug would need to reach high concentra- 
tions in varied tissues/subcellular compartments, and might carry 
increased toxicity risk. Instead, alternative analogues from this series 
with different pharmacological profiles might be needed for treatment 
of different kinetoplastid infections. Nevertheless, there are only scarce 
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resources for drug development in these diseases, and identification of 
a common target and chemical scaffold with potential across multiple 
indications provides new hope for improved treatment options for 
some of the world’s poorest people. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. 

Ethics statement for animal models. All procedures involving mice were 
performed in accordance with AAALAC standards or under UK Home Office 
regulations, and were reviewed and approved in accordance with the Novartis 
Animal Welfare Policy. Sample size was determined on the basis of the minimum 
number of animals required for good data distribution and statistics. Blinding 
was not possible in these experiments but animals were selected randomly for 
each group. 

Determination of ICs9, ECs9, and CCso values. Reported IC59/ ECso/ CCs values 
were calculated by averaging IC59/ ECso/ CCso values obtained from individual 
technical replicate experiments (n; specified in relevant Figure captions and 
Methods sub-sections). Each technical replicate experiment was performed on a 
different day with freshly prepared reagents. Reported standard errors of mean 
(s.e.m.) were calculated using ICs9/ ECso/ CC values determined in individual 
technical replicate experiments. To calculate ICso/ ECs9/ CCso values, measured 
dose response values were fitted with 4-parameter logistic function y = AtB=A) 


(model 201, XLfit, IDBS), where x refers to compound concentration and 
y corresponds to an assay readout value. 

Leishmania donovani axenic amastigote growth-inhibition assay. RPMI-1640 
medium (HyClone) was supplemented with 20% heat-inactivated fetal bovine 
serum (Omega Scientific), 23 1M folic acid (Sigma-Aldrich), 100j1M adenosine 
(Sigma-Aldrich), 22 mM p-glucose (Sigma-Aldrich), 4mM t-glutamine (Hyclone), 
25 mM 2-(4-morpholino) ethanesulfonic acid (Sigma-Aldrich) and 100 IU 
penicillin/ 100 1g/ml streptomycin (HyClone), and adjusted to pH=5.5 with 
6M hydrochloric acid (Fisher Scientific) at 37°C. Leishmania donovani MHOM/ 
SD/62/1S-CL2D axenic amastigotes were cultured in 10 ml of this medium (Axenic 
Amastigote Medium) in T75 CELL-STAR flasks (Greiner Bio-One) at 37 °C/ 
5% CO, and passaged once a week. 

To determine compound growth inhibitory potency on L. donovani axenic 
amastigotes, 100 nl of serially diluted compounds in DMSO were transferred to 
the wells of white, solid bottom 384-well plates (Greiner Bio-One) by Echo 555 
acoustic liquid handling system (Labcyte). Then, 1 x 10° of L. donovani axenic 
amastigotes in 40 11 of Axenic Amastigote Medium were added to each well, and 
plates were incubated for 48h at 37 °C/ 5% COb. Parasite numbers in individual 
plate wells were determined through quantification of intracellular ATP. The 
CellTiter-Glo luminescent cell viability reagent (Promega) was added to plate wells, 
and ATP-dependent luminescence signal was measured on an EnVision MultiLabel 
Plate Reader (Perkin Elmer). Luminescence values in wells with compounds were 
divided by the average luminescence value of the plate DMSO controls, and used 
for calculation of compound ECs values as described above. 

Axenic amastigote ECs9 values shown in Fig. 4b correspond to means of 
2 technical replicates. 

Isolation and maintenance of Leishmania donovani splenic amastigotes. Female 
BALB/c] mice (Envigo) infected with L. donovani MHOM/ET/67/HU3 (ATCC) for 
50-80 days were euthanized, and infected spleens were removed and weighed. The 
weight of an infected spleen ranged from 300 to 600 mg. For comparison, spleens 
from non-infected age-matched BALB/c] mice weighed ~100 mg. Infected spleens 
were washed in Axenic Amastigote Medium (composition described above) and 
placed into Falcon 50 ml conical centrifuge tubes (Fisher Scientific) containing 
ice-cold Axenic Amastigote Medium (15 ml per infected spleen). Spleens were 
homogenized on ice in a Dounce homogenizer and centrifuged at 200g for 
15min at 4°C to remove tissue debris. Leishmania donovani amastigotes present 
in the supernatant were pelleted by centrifugation at 1,750g for 15 min at 4°C 
and re-suspended either in Axenic Amastigote Medium (when used for in vitro 
macrophage infections) or in Hanks’ Balanced Salt Solution (when used for mouse 
infections; Hyclone). Suspensions of splenic amastigotes were kept on ice and used 
for in vitro or in vivo infections within 2-3 h. To propagate L. donovani amastigotes 
in vivo, 6-7-week-old female BALB/c] mice were infected with 8 x 10’ purified 
splenic amastigotes in 200 il of Hanks’ Balanced Salt Solution by tail vein injection. 
Leishmania donovani intra-macrophage amastigote growth-inhibition assay. 
In vitro compound potencies on intra-macrophage L. donovani MHOM/ET/67/ 
HU3 were determined using primary murine peritoneal macrophages infected 
with L. donovani splenic amastigotes. Primary macrophages were elicited in female 
BALB/c) mice for 72h following the injection of 500 1l of sterile aqueous 2% starch 
(J. T. Baker) solution into the mouse peritoneal cavity. The protocol used for 
isolation of peritoneal macrophages was described in detail previously*!. The 
isolated macrophages were re-suspended in Macrophage Infection Medium (RPMI- 
1640 medium supplemented with 2mM L-glutamine, 10% heat-inactivated fetal 
bovine serum, 10mM sodium pyruvate (Hyclone), and 100 IU penicillin/ 100 j1g/ml 
streptomycin), and 50 il of macrophage suspension (4 x 10° macrophages/ml) were 


added to microscopy-grade, clear-bottom, black 384-well plates (Greiner Bio-One). 
Following overnight incubation at 37°C/ 5% COn, plate wells were washed with 
Macrophage Infection Medium to remove non-adherent cells using ELx405 Select 
microplate washer (BioTek), and then filled with 4011 of Macrophage Infection 
Medium. Leishmania donovani HU3 splenic amastigotes isolated from infected 
spleens were re-suspended in Macrophage Infection Medium at a concentration 
of 6 x 10’ cells/ml, and 10 i of the suspension were added to assay plate wells 
containing adherent macrophages. After a 24-h infection period at 37°C/ 
5% COs, plate wells were washed with Macrophage Infection Medium to remove 
residual extracellular parasites and re-filled with 50 il of the medium. Leishmania 
donovani-infected macrophages were subsequently treated with DMSO-dissolved 
compounds (0.5% final DMSO concentration in the assay medium) in dose 
response for 120h at 37 °C/ 5% COb. Next, treated macrophages were washed 
with the phosphate-buffered saline buffer (PBS; Sigma-Aldrich) supplemented 
with 0.5mM magnesium chloride (Sigma-Aldrich) and 0.5 mM calcium chloride 
(Sigma-Aldrich), fixed with 0.4% paraformaldehyde (Sigma-Aldrich) in PBS, 
permeabilized with 0.1% Triton X-100 (Sigma-Aldrich) in PBS, and stained with 
SYBR Green I nucleic acid stain (Invitrogen, 1:100,000 dilution in PBS) overnight 
at 4°C. Image collection and enumeration of macrophage cells and intracellular 
L. donovani amastigotes was performed using the OPERA QEHS automated 
confocal microscope system equipped with 20x water immersion objective 
(Evotec Technologies) and the OPERA Acapella software (Evotec Technologies) 
as described previously**. 

All reported intra-macrophage L. donovani ECso values were calculated from 
at least 3 technical replicates (n =3 or n= 4; specified in relevant figure captions). 
Trypanosoma brucei growth inhibition assay. Bloodstream form Trypanosoma 
brucei Lister 427 parasites were continuously passaged in HMI-9 medium 
formulated from IMDM medium (Invitrogen), 10% heat-inactivated fetal 
bovine serum, 10% Serum Plus medium supplement (SAFC Biosciences), 1 mM 
hypoxanthine (Sigma-Aldrich), 501M bathocuproine disulfonic acid (Sigma- 
Aldrich), 1.5mM cysteine (Sigma-Aldrich), 1 mM pyruvic acid (Sigma-Aldrich), 
39 g/ml thymidine (Sigma-Aldrich), and 1411/13-mercapthoethanol (Sigma- 
Aldrich); all concentrations of added components refer to those in complete HMI-9 
medium. The parasites were cultured in 10 ml of HMI-9 medium in T75 CELL- 
STAR tissue culture flasks at 37 °C/ 5% CO>. 

To determine compound growth inhibitory potency on T: brucei bloodstream 
form parasites, 100 nl of serially diluted compounds in DMSO were transferred 
to the wells of white, solid bottom 384-well plates (Greiner Bio-One) by Echo 555 
acoustic liquid handling system. Then, 5 x 10° of T. brucei parasites in 40 il of 
HMI-9 medium were added to each well, and the plates were incubated for 48h at 
37°C/ 5% CO. Parasite numbers in individual plate wells were determined through 
quantification of intracellular ATP amount. The CellTiter-Glo luminescent cell 
viability reagent was added to plate wells, and ATP-dependent luminescence signal 
was measured on an EnVision MultiLabel Plate Reader. Luminescence values in 
wells with compounds were divided by the average luminescence value of the 
plate DMSO controls, and used for calculation of compound ECs» values as 
described above. 

Trypanosoma brucei ECso values shown in Fig. 1 and Extended Data Fig. 3 

correspond to means of 4 technical replicates. 
Trypanosoma cruzi amastigote growth-inhibition assay. NIH 3T3 fibroblast 
cells (ATCC) were maintained in RPMI-1640 medium (Life Technologies) 
supplemented with 10% heat-inactivated fetal bovine serum and 100 IU penicillin/ 
100 1g/ml streptomycin at 37 °C/ 5% CO>. Trypanosoma cruzi Tulahuen parasites 
constitutively expressing Escherichia coli 8-galactosidase** were maintained in 
tissue culture as an infection in NIH 3T3 fibroblast cells. Briefly, 2 x 107 T: cruzi 
trypomastigotes were used to infect 6 x 10° NIH 3T3 cells growing in T75 CELL- 
STAR tissue culture flasks and cultured at 37 °C/ 5% CO, until proliferating 
intracellular parasites lysed host 3T3 cells and were released into the culture 
medium (typically 6-7 days). During the infection, the tissue culture medium 
was changed every two days. Number of T: cruzi trypomastigotes present in 1 ml 
of medium was determined using a haemocytometer. 

To determine compound potency on intracellular T. cruzi amastigotes, NIH 
3T3 cells were re-suspended in phenol red-free RPMI-1640 medium containing 
3% heat-inactivated fetal bovine serum and 100 IU penicillin/ 100 .g/ml 
streptomycin, seeded at 1,000 cells/ well (4011) in white, clear bottom 384-well 
plates (Greiner Bio-One), and incubated overnight at 37 °C/ 5% CO). The following 
day, 100 nl of each compound in DMSO were transferred to individual plate wells 
by Echo 555 acoustic liquid handling system. After one hour incubation, 1 x 10° of 
tissue culture-derived T-cruzi trypomastigotes, in 10,11 of phenol red-free RPMI-1640 
medium supplemented with 3% heat-inactivated fetal bovine serum and 100 IU 
penicillin/ 100,1g/ml streptomycin were added to each well. Plates were then 
incubated for 6 days at 37 °C/ 5% COp. Intracellular T: cruzi parasites were quantified 
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by measuring the activity of parasite-expressed (}-galactosidase. Ten microlitres 
of a chromogenic }-galactosidase substrate solution (0.6 mM chlorophenol 
red-3-D-galactopyranoside/ 0.6% NP-40 in PBS; both reagents from Calbiochem) 
were added to each well and incubated for 2h at room temperature. After incu- 
bation, absorption was measured at 570 nM on SpectraMax M2 plate reader 
(Molecular Devices). Measured absorbance values in wells with compounds were 
divided by the average absorbance value of the plate DMSO controls, and used for 
calculation of compound ECs» values as described above. 

Trypanosoma cruzi amastigote ECs values shown in Fig. 1 and Extended Data 

Fig. 3 correspond to means of 4 technical replicates. 
Trypanosoma cruzi epimastigote proliferation assay. Trypanosoma cruzi CL 
epimastigotes were continuously passaged in LIT medium containing 9 g/l liver 
infusion broth (Difco), 5 g/l bacto tryptose (Difco), 1 g/l sodium chloride, 8 g/l 
dibasic sodium phosphate (Sigma-Aldrich), 0.4 g/l potassium chloride (Sigma- 
Aldrich), 1 g/l p-glucose, 10% heat-inactivated fetal bovine serum and 10 ng/ml 
of hemin (Sigma-Aldrich). The medium was adjusted to pH = 7.2 with 6 M hydro- 
chloric acid. The parasites were cultured in 10 ml of LIT medium in T75 CELL- 
STAR tissue culture flasks at 27°C. 

To determine compound growth inhibitory potency on T. cruzi epimastigotes, 
100 nl of serially diluted compounds in DMSO were transferred to the wells of 
white, solid bottom 384-well plates (Greiner Bio-One) by an Echo 555 acoustic 
liquid handling system. Then, 5 x 10° of T. cruzi epimastigotes in 40 jl of LIT 
medium were added to each well, and the plates were incubated for 7 days at 27°C. 
Parasite numbers in individual plate wells were determined through quantification 
of intracellular ATP amount. The CellTiter-Glo luminescent cell viability reagent 
was added to plate wells, and ATP-dependent luminescence signal was measured 
on an EnVision MultiLabel Plate Reader. Luminescence values in wells with 
compounds were divided by the average luminescence value of the plate DMSO 
controls, and used for calculation of compound ECs» values as described above. 

Trypanosoma cruzi epimastigote ECs» values shown in Extended Data Fig. 4 

correspond to means of 3 technical replicates. 
Mouse fibroblast NIH 3T3 growth-inhibition assay. NIH 3T3 fibroblast cells 
were maintained in RPMI-1640 medium with glutamine (Life Technologies) 
supplemented with 5% heat-inactivated fetal bovine serum and 100 IU penicillin/ 
100j.g/ml streptomycin (3T3 medium) at 37 °C/ 5% CO>. NIH 3T3 fibroblast cells 
were purchased from ATCC. We did not perform cell line authentication and did 
not test the cells for mycoplasma contamination. This cell line is not listed in the 
database of commonly misidentified cell lines maintained by ICLAC and NCBI 
Biosample. 

To determine compound potency, NIH 3T3 cells re-suspended in 3T3 
medium were seeded at 1,000 cells/well (5011) in white 384-well plates (Greiner 
Bio-One) and incubated overnight at 37°C/ 5% CO}. The following day, 100 nl 
of each compound in DMSO were transferred to individual plate wells by Echo 
555 acoustic liquid handling system and plates were incubated for five days at 
37 °C/ 5% CO . Cell numbers in individual plate wells were determined through 
quantification of intracellular ATP amount. The CellTiter-Glo luminescent cell 
viability reagent was added to plate wells, and ATP-dependent luminescence signal 
was measured on an EnVision MultiLabel Plate Reader. Luminescence values in 
wells with compounds were divided by the average luminescence value of the 
plate DMSO controls, and used for calculation of compound CCspo values as 
described above. 

NIH 3T3 CCso values shown in Fig. 1 and Extended Data Fig. 3 correspond to 

means of 4 technical replicates. 
Primary macrophage cytotoxicity assay. Primary macrophage cell viability was 
determined on mouse peritoneal macrophages infected with L. donovani and was 
expressed as the ratio of the number of macrophage cells in wells treated with a 
compound to those in wells treated with DMSO. The number of macrophage cells 
in wells was determined by high content microscopy as described previously”. 

All reported macrophage CC;» values were calculated from 4 technical replicates 
(n=4; also specified in Fig. 1 and Extended Data Fig. 3 captions). 

Selection of GNF3943- and GNF8000-resistant T. cruzi mutants. Trypanosoma 
cruzi epimastigotes cultures resistant to GNF3943 and GNF8000 were generated 
using a methodology described previously. Briefly, epimastigotes were initially 
cultured in the presence of compound concentration equivalent to its ECy) value 
(GNF3943 ECy9 = 1.51M and GNF8000 EC29 = 0.2 1M in 0.2% DMSO) or 0.2% 
DMSO (control). Once a week, parasites were counted and growth rates were 
determined. If the parasite cultures exhibited a reduced growth rate compared to 
0.2% DMSO-treated parasites, epimastigotes were cultured at the same compound 
concentration. Once the growth rates matched that of the control epimastigote 
culture (0.2% DMSO), parasites were transferred into medium containing twofold 
higher compound concentration. The process was repeated until substantial 
resistance was achieved (~10- to 20-fold increase in corresponding ECs value). 
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The time required for generation of cultures with such a level of resistance was 
approximately five months. Resistant clones were isolated via cloning by limiting 
dilution, and two independent clones were analysed by whole-genome sequencing. 
T. cruzi whole-genome sequencing. Chromosomal DNA isolation from 
GNF3943- and GNF8000-resistant T. cruzi clones, whole-genome sequencing and 
sequence analysis were performed as described previously”. Sequencing reads 
were aligned to the T: cruzi CL Brenner genome™*. 

Generation of T. cruzi strains ectopically expressing proteasome 84 subunit 
variants. PSMB4 TcCLB503891.100 was amplified from T. cruzi CL Brenner 
genomic DNA using KOD Hot Start DNA Polymerase (EMD Millipore), and 
sense (5’-AAAGCGGCCGCATGTCGGAGACAACCATTG-3’) and antisense 
(5’-CCATGATCTTGATGTAATATAAGGCATTCAGCCCTGCTG-3’) primers. 
The PSMB4**#" gene was generated from the wild-type PSMB4 construct by 
site-directed mutagenesis using mutagenic sense (5’/-CAGCAGGGCTGAATGC 
CTTATATTACATCAAGATCATGG-3’) and antisense (5’/-CCATGATCTTGATG 
TAATATAAGGCATTCAGCCCTGCTG-3’) primers and QuikChange II Site- 
Directed Mutagenesis Kit (Stratagene). The sequences of the wild-type and mutant 
PSMB4 genes were verified by sequencing and both gene versions were subcloned 
into the T. cruzi expression vector pTcIndex1 under control of a T7 promoter*’. 
Trypanosoma cruzi CL Brenner epimastigotes were first transfected as described 
previously** with the pLEW13 plasmid*” harbouring a tetracycline-inducible 
T7 RNA polymerase gene. Transfected epimastigotes were selected in medium 
supplemented with neomycin (G418) at 500j1g/ml, and then transfected a second 
time with either pTcIndex1-PSMB4" or pTcIndex1-PSMB4'* plasmid. Double- 
transfected epimastigotes were selected in the presence of 500,1g/ml of G418 
(Sigma-Aldrich) and 500\1:g/ml of hygromycin (Sigma-Aldrich). Susceptibility 
of double transfected epimastigote cell lines to compounds was assessed using 
induced (+5 mg/ml of tetracycline) and non-induced parasite cultures after five 
days of compound treatment. Parasite viability was determined with AlamarBlue 
(ThermoFisher Scientific). 

Reported ECs» values for T: cruzi epimastigotes ectopically expressing PSMB4 

proteins were calculated from 3 technical replicates (n = 3; also specified in the 
Fig. 3a caption). 
Generation of T. brucei strains ectopically expressing proteasome 34 subunit 
variants. PSMB4 (Tb927.10.4710) was amplified from T. brucei Lister 
427 genomic DNA using PCR SuperMix High Fidelity (Invitrogen), sense 
(5'-GCAAGCT TATGGCAGAGACGACTATCGG-3’) and antisense 
(5'-GCGGATCCCTAGCTTACAGATTGCACTC-3’) primers. The PSMB4*?4" 
gene was generated from the wild-type PSMB4 construct by site-directed 
mutagenesis using mutagenic sense (5/- GCTGCGGGGTTAAATGCGT TATACTAC 
ATTAAGATAACGG-3’), antisense (5‘-CCGTTATCT TAATGTAGTATAACG 
CATTTAACCCCGCAGC-3’) primers and QuikChange II Site-Directed 
Mutagenesis Kit (Stratagene). The sequences of the wild-type and mutant PSMB4 
genes were verified by sequencing and both gene versions were cloned into the 
T. brucei expression vector pHD1034 under control of a ribosomal RNA promoter. 
Transfected T. brucei Lister 427 cells were selected in medium supplemented with 
puromycin at 1 j.g/ml. Susceptibility of transfected T. brucei cell lines to compounds 
was assessed after 2 days of compound treatment. Parasite viability was determined 
with CellTiter-Glo. 

Reported ECso values for T: brucei parasites ectopically expressing PSMB4 
proteins were calculated from 3 technical replicates (n = 3; also specified in the 
Fig. 3b caption). 

Purification of parasite 20S proteasomes. Trypanosoma cruzi CL epimastigotes, 
L. donovani MHOM/SD/62/1 S-CL2D axenic amastigotes and T. brucei Lister 427 
bloodstream form trypomastigotes were grown to log phase and harvested by 
centrifugation. The corresponding cell pellets were stored at —80°C until further 
use. Prior to purification, 10g of cell pellets were thawed, re-suspended in lysis 
buffer (50mM Tris-HCl pH =7.5, 1mM TCEP, 5mM EDTA, and 10|1M E-64), 
and lysed by passing cell suspension three times through a needle (22 gauge) and 
by subsequent three freeze/thaw cycles. The lysate was first cleared of cellular 
debris by two centrifugation steps (15,000g at 4°C for 15 min followed by 40,000g at 
4°C for 60 min) and then fractionated through ammonium sulphate precipitation. 
The protein fraction precipitated between 45% and 65% of ammonium sulphate 
saturation was re-suspended in 25 mM Tris-HCl pH =7.5, 1mM TCEP buffer, 
and dialysed overnight at 4°C against the same buffer. Proteasomes were further 
purified by anion exchange chromatography (Resource Q column, GE Healthcare 
Life Sciences) and size-exclusion chromatography (Superose 6 column, GE 
Healthcare Life Sciences) as described elsewhere*’. Active fractions from the latter 
purification step were pooled and used in proteasome biochemical assays. 

Subunit composition analysis of purified T. cruzi 20S proteasome by LC-MS/MS. 
Purified T. cruzi proteasome sample was buffer-exchanged and concentrated 
into 100 mM trimethylamine bicarbonate-HCl pH = 8.0, 150 mM NaCl buffer 
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using a 10kDa molecular weight cut-off micro-concentrator (Milipore Amicon 
Ultra). The resulting proteasome sample (20011, 1 mg/ml) was mixed with 5 11 of 
a TMTsixplex reagent (Pierce). After 60s incubation to label primary amines, the 
reaction was stopped by adding 2511 of 5% hydroxylamine. The labelled sample 
was run on 4—20% Bis-Tris PAGE gel (Invitrogen) to separate polypeptides. The 
gel was stained with eStain 2.0 (GenScript). Stained protein bands were cut out 
and in-gel-digested separately with elastase (Promega) and asparaginase (Roche). 
Peptides generated by the digestions were resolved by HPLC using a vented column 
setup with a 2cm Poros 10 R2 (Life Technologies, Carlsbad, CA) self-packed 
pre-column, and a PepMap Easy-Spray C18 analytical column (15cm x 75,1m 
ID, Thermo Scientific). Resin-bound proteolytic fragments were eluted with 2 to 
40% acetonitrile / 0.1% formic acid operated at 300 nl/min for 120 min. Spectra of 
eluted peptide species were determined by a column-coupled Q Exactive hybrid 
quadrupole orbitrap mass spectrometer (Thermo Scientific). Proteome Discoverer 
v1.4 software (Thermo Scientific) was used to search the T.cruzi genome” with 
identified spectra for presence of 20S proteasome subunits (Supplementary 
Table 7). Search parameters included fixed carbamidomethy] modification of 
cysteine, and variable oxidation of methionine, deamidation of asparagine, pyro- 
glu of N-terminal glutamine, and TMT(6-plex) modification of lysine residues. 
Measuring proteasome proteolytic activities. The activity of purified parasite 
and human 20S proteasomes was monitored by measuring cleavage of various 
rhodamine-labelled fluorogenic substrates. Purified 20S proteasomes were diluted 
in proteasome assay buffer (25 mM Tris-HCl pH 7.5, 1 mM dithiothreitol (Sigma- 
Aldrich), 10mM sodium chloride, 25 mM potassium chloride, 1 mM magnesium 
chloride, 0.05% (w/v) CHAPS (Sigma-Aldrich) and 0.9% DMSO) at a final 
concentration of 162 nM (parasite proteasomes) or 25nM (human proteasome), 
and pre-incubated with compound (40 nl; 0.2% final DMSO concentration) for 
1h. Next, the following substrates (Biosynthan GmbH) were added at 341M final 
concentration to monitor specific proteolytic activities (Suc-LLVY-Rh110-dPro: 
chymotrypsin-like activity; Ac-RLR-Rh110-dPro: trypsin-like activity; Ac-GPLD- 
Rh110-dPro: caspase-like activity). The reaction was allowed to proceed for 
two hours at room temperature and fluorescence as a measure of purified 20S 
proteasome activity was monitored using the EnVision plate reader (excitation at 
485 nm/emission at 535nm). Km and K; values were calculated using GraphPad 
Prism (GraphPad Software) ‘non-competitive enzyme inhibition’ function. 

Data shown in Fig. 4a, c, d and Extended Data Table 3 represent means of 

3 technical replicates (n =3). Data shown in Fig. 4b and Extended Data Fig. 5 
represent means of 2 technical replicates (n= 2). 
Monitoring accumulation of ubiquitylated proteins in intact cells. Growing 
T. cruzi CL epimastigotes were seeded into 24-well tissue culture plate (1 x 107 
cells per well) in LIT medium and treated for 2-12 h with DMSO (0.2%) or various 
concentrations of bortezomib and GNF6702 at 27°C. Following the treatment, 
parasites were collected by centrifugation (3,500g for 6 min) and washed twice 
with phosphate-buffered saline (PBS). Epimastigotes were lysed by resuspending 
washed cells in a buffer containing 50 mM Tris-HCl pH = 7.4, 150mM sodium 
chloride, 1% CHAPS, 20|1M E-64 (Sigma-Aldrich), 10 mM EDTA(Sigma-Aldrich), 
5 mM N-ethylmaleimide(Sigma-Aldrich), 1mM phenylmethylsulfonyl fluoride 
(Sigma-Aldrich), 10j1g/ml leupeptin (Sigma-Aldrich), 10,1g/ml aprotinin (Sigma- 
Aldrich), and incubating the suspension on ice for 20 min. Cell lysates were cleared 
by centrifugation at 21,000g for 30 min at 4°C. 

For 3T3 cells, 2 x 10° cells/well were seeded into 24-well tissue culture plates 
in RPMI-1640 medium supplemented with 10% heat-inactivated fetal bovine 
serum, and incubated overnight at 37°C to allow cells to attach. Attached cells were 
treated for 2h with DMSO (0.25%) or various concentrations of bortezomib and 
GNF6702. Treated cells were washed twice with PBS and then lysed by incubating 
cells in modified RIPA buffer (50mM Tris-HCl pH =7.4, 1% Triton X-100, 0.2% 
sodium dodecylsulfate, 1 mM EDTA, 1mM phenylmethylsulfonyl fluoride, 5 1g/ml 
aprotinin, 5j1g/ml leupeptin) for 30 min at 4°C. Cell lysates were cleared by 
centrifugation at 21,000g for 30 min at 4°C. 

Protein concentration in cell extracts was determined with BCA assay 
(ThermoFisher), and 101g of cell extracts were loaded on NuPAGE Novex 4-12% 
Bis-Tris gel (Invitrogen). After electrophoresis, resolved proteins were transferred 
to nitrocellulose membrane. Ubiquitylated proteins were detected with polyclonal 
anti-ubiquitin primary antibody (Proteintech, catalogue number 10201-2-AP) and 
rabbit anti-mouse IgG-peroxidase antibody (Sigma-Aldrich, catalogue number 
A0545), and then imaged using ECL Prime Western Blotting Detection Reagent 
(Amersham) on Chemidoc XR+ imaging system (BioRad). Collected western blot 
images were quantified using Image Lab software (BioRad). Briefly, rectangles of 
identical size and shape were drawn around each blot lane to include inside the 
shape all ubiquitylated protein bands within 17-198 kDa molecular mass range. 
Next, integrated signal intensities within the rectangles (reported by the Image 
Lab software) were used for calculation of ECsp values. Three technical replicate 


experiments (n = 3) for each different dose response experiment (GNF6702 on 
T. cruzi epimastigotes; GNF6702 on 3T3 cells; bortezomib on T. cruzi epimastigotes; 
bortezomib on 3T3 cells) were performed. 
Trypanosoma cruzi proteasome modelling studies. The homology model of 
T. cruzi 20S proteasome was built using ‘Prime’ protein structure prediction 
program (Schrédinger) and X-ray structure of bovine 20S proteasome (PDB 
accession code 1IRU)*? as the template. The model was subjected to restrained 
minimization to relieve inter-chain clashes. ‘SiteMap’ program (Schrédinger) was 
used to identify pockets on a protein surface suitable for small molecule binding. 
Flexible ligand docking was performed using ‘Glide 5.8’ (Schrédinger). The 
grid box was centred in a middle of the identified pocket and extended by 10 A, 
with outer box extending an additional 20 A. The ligand was docked using the 
standard precision (SP) algorithm and scored using ‘GlideScore’ (Schrodinger). 
The GNF6702 GlideScore is equal to —8.5. 
Receptor, enzyme and ion channel assays. GNF6702 profiling was performed at 
10\1M concentration in a selectivity panel at Eurofins (www.eurofinspanlabs.com/ 
Catalog/AssayCatalog/AssayCatalog.aspx). Listed values correspond to the assay 
readout values expressed relative to the DMSO control. To determine inhibition 
of a subset of human tyrosine kinases by GNF6702, the inhibitor was profiled on 
a panel of Ba/F3 cell lines expressing individual Tel-activated kinases as described 
previously”’. All assays were performed as single technical repeats. 
Determination of GNF6702 thermodynamic solubility. The solubility of 
GNF6702 was assessed in a high throughput thermodynamic solubility assay as 
described previously*". First, 25 11 of GNF6702 DMSO solutions were transferred 
to individual wells of a 96-well plate. DMSO was evaporated and 250 1l of 67 mM 
potassium phosphate buffer pH 6.8 were added to yield projected final compound 
concentrations from 11M to 100M. The plate was sealed to prevent solvent loss 
and shaken for 24h at room temperature. The plate was then filtered to remove 
non-dissolved material. Concentration of GNF6702 in individual plate wells was 
determined by measuring solution UV absorbance with reference to a GNF6702 
calibration curve. 
Determination of GNF6702 permeability in Caco-2 assay. A 96-Multiwell Insert 
System (BD Biosciences) was used for the Caco-2 cell culture and permeability 
assay as described previously**. Caco-2 cells were seeded onto insert wells at 
a density of 1.48 x 10° cells per ml and allowed to grow for 19-23 days before 
assays. To measure both absorptive (apical to basolateral (A-B)) and secretory 
(basolateral to apical (B—A)) compound transport, a solution of GNF6702 at 101M 
concentration in 0.5% DMSO were added to donor wells. The plate was incubated 
at 37°C for 2h, with samples taken at the beginning and end of the incubation from 
both donor and acceptor wells. The concentration of GNF6702 was determined 
by LC-MS/MS. 

Apparent drug permeability (Papp) was calculated using the following equation: 


2G, 1 
“pp dt AX Cin 


where dQ/dt is the total amount of a test compound transported to the acceptor 
chamber per unit of time (nmol/s), A is the surface area of the transport membrane 
(0.0804 cm?), Cin is the initial compound concentration in the donor chamber 
(10{M), and Ppp is expressed as cm/s. 

Determination of human CYP450 inhibition by GNF6702. Extent of inhi- 
bition of major human CYP450 isoforms 2C9, 2D6 and 3A4 by GNF6702 
was determined using pooled human liver microsomes and the known 
specific substrates of various CYP450 isoforms: diclofenac (51M), bufuralol 
(54M), midazolam (541M), and testosterone (501M). Probe substrate 
concentrations were used at concentrations equal to their reported K,,, values. 
The CYP450 inhibition assays with probe substrates diclofenac (2C9) or 
midazolam (3A4) were incubated at 37°C for 5 to 10 min using a microsomal 
protein concentration of 0.05 mg/ml. Probe substrates bufuralol (2D6) and 
testosterone (3A4) were incubated at 37°C for 20 min using microsomal 
concentration 0.5 mg/ml. The test concentrations of GNF6702 ranged from 
0.5 to 251M in the presence of 1% DMSO. The reactions were initiated by adding 
NADPH (1 mM final concentration; Sigma-Aldrich) after a 5-min preincubation. 
Incubations were terminated by the addition of 300 1l of acetonitrile to 100 il 
of a sample. No detectable cytochrome P450 inhibition was observed. Extent of 
CYP450 isoform inhibition was determined by quantifying residual concentrations 
of individual CYP450 substrate probes at the end of reactions by LC-MS/MS. 
Determination of GNF6702 in vitro metabolic stability. The intrinsic metabolic 
stability of GNF6702 was determined in mouse and human liver microsomes 
using the compound depletion approach and LC-MS/MS quantification. The 
assay measured the rate and extent of metabolism of GNF6702 by measuring the 
disappearance of the compound. The assay determined GNF6702 in vitro half-life 
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(Ty/2) and hepatic extraction ratios (ER) as described previously**. GNF6702 was 
incubated for 30 min at 1.0}.M concentration in a buffer containing 1.0 mg/ml liver 
microsomes. Samples (50 1) were collected at 0, 5, 15 and 30 min and immediately 
quenched by addition of 150 11 of ice-cold acetonitrile/methanol/water mixture 
(8/1/1). Quantification of GNF6702 in samples was performed by LC-MS/MS, 
and the in vitro intrinsic clearance was determined using the substrate depletion 
method. The intrinsic clearance, CL, was calculated using the following equation: 


0.693  V 
x 


CLin = 
int fA M 


where T})2 is the in vitro half-life, V (\1l) is the reaction volume, and M (mg) is the 
microsomal protein amount. Finally the hepatic extraction ratio is calculated as: 


mer 
Qn 


where CL) = hepatic clearance, Q = hepatic blood flow. 
CLy was calculated using the following equation: 


_ Q xf, x CLint 
Qu tf, X CLint 


ER 


CLy 


where fy = fraction unbound to protein (assumed to be 1). 

Pharmacokinetic studies. An outline of various in vitro and in vivo DMPK assays 
used in this study for compound profiling was summarized previously“. The phar- 
macokinetic properties of GNF compounds and calculation of pharmacokinetic 
parameters was performed as described previously”*. Mean compound plasma 
concentrations were calculated from fitted functions approximating compound 
plasma profile throughout eight days of dosing. Blinding was not possible in these 
experiments. 

Bioanalysis of GNF6702 in plasma. Plasma concentration of GNF6702 was 
quantified using a LC-MS/MS assay. Solution of 20 ng/ml of verapamil hydro- 
chloride (Sigma-Aldrich) in acetonitrile/methanol mixture (3/1 by volume), was 
used as an internal standard. Twenty microlitres of plasma samples were mixed 
with 200 1l of internal standard solution. The samples were vortexed and then 
centrifuged in an Eppendorf Centrifuge 5810 R (Eppendorf) at 4,000 r.p.m. for 
5 min at 4°C to remove precipitated plasma proteins. The supernatants (15011) 
were transferred to a 96-well plate and mixed with 15011 H2O. The samples (1011) 
were then injected onto a Zorbax SB-C8 analytical column (2.1 x 30mm, 3.5 41m; 
Agilent Technologies) and separated using a three step gradient (1st step: 1.5 ml 
of 0.05% formic acid in 10% acetonitrile; 2nd step: 0.5 ml of 0.05% formic acid in 
100% acetonitrile; 3rd step: 0.5 ml of 0.05% formic acid in 10% acetonitrile) at flow 
rate of 700 l/min. GNF6702 and verapamil were eluted at retention time 1.19 and 
1.17 min, respectively. The HPLC system, consisting of Agilent 1260 series binary 
pump (Agilent Technologies), Agilent 1260 series micro vacuum degasser (Agilent 
Technologies) and CTC PAL-HTC-xt analytics autosampler (LEAP Technologies) 
was interfaced to a SCIEX API 4000 triple quadrupole mass spectrometer (Sciex). 
Mass spectrometry analysis was carried out using atmospheric pressure chemical 
ionization (APCI) in the positive ion mode. GNF6702 (430.07 > 333.20) and 
verapamil (455.16 > 164.90) peak integrations were performed using AnalystTM 
1.5 software (Sciex). The lower limit of quantification (LLOQ) in plasma was 
1.0ng/ml. Samples were quantified using seven calibration standards (dynamic 
range 1-5,000 ng/ml) prepared in plasma and processed as described above. 
Formulation of study drugs for in vivo efficacy experiments. All compounds 
administered to mice during efficacy experiments were formulated as suspensions 
in distilled water containing 0.5% methylcellulose (Sigma-Aldrich) and 0.5% 
Tween 80 (Sigma-Aldrich). During a treatment course, each mouse received 0.2 ml 
of drug suspension per dose by oral gavage. 

Mouse model of visceral leishmaniasis. Female BALB/c] mice (Envigo; 6-8 
weeks old) were infected by tail vein injection with 4 x 107 L. donovani MHOM/ 
ET/67/HU3 splenic amastigotes (protocol number P11-319). Seven days 
after infection, animals were orally dosed for eight days with vehicle (0.5% 
methylcellulose/ 0.5% Tween 80, miltefosine (12 mg/kg once-daily; Sigma- 
Aldrich), or a GNF compound (twice-daily). On the first day of dosing, three 
mice were used for collection of blood for PK determination and euthanized 
afterwards. On the last day of dosing, PK samples were collected from remaining 
five mice, which were also used for determination of compound efficacy 
(n=5 mice per group). Liver samples were collected from these five mice and 
L. donovani parasite burdens were quantified by qPCR as follows. Total DNA was 
extracted from drug-treated mice livers using the DNeasy Blood and Tissue Kit 
(Qiagen). Two types of DNA were quantified in parallel using the TaqMan assay: 
L. donovani major surface glycoprotein gp63 (Ldon_GP63) and mouse Gapdh. 
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Leishmania donovani gp63 DNA was quantified with the following set of primers: 
TGCGGTTTATCCTCTAGCGATAT (forward), AGTCCATGAAGGCGGAGATG 
(reverse), and TGGCAGTACTTCACGGAC (TaqMan MGB probe, 5’-FAM- 
labelled reporter dye, non-fluorescent quencher). Mouse Gapdh DNA was 
quantified with the following set of primers: GCCGCCATGTTGCAAAC 
(forward primer), CGAGAGGAATGAGGT TAGTCACAA (reverse primer), and 
ATGAATGAACCGCCGTTAT (TaqMan MGB probe, 5’-FAM-labelled reporter 
dye, non-fluorescent quencher). Each qPCR reaction (10,11) included 511 of 
TaqMan Gene Expression Master Mix (Life Technologies), 0.511 of a 20x primer/ 
probe mix (Life Technologies), and 4.5 11 (50 ng) of total DNA from liver samples. 
DNA amount was quantified using the Applied Biosystems 7900HT instrument. 
Leishmania donovani parasite burden (RU: relative units) was expressed as the 
abundance of L. donovani gp63 DNA relative to the abundance of mouse Gapdh 
DNA. 

Mouse footpad model of cutaneous leishmaniasis. Leishmania major MHOM/ 
SA/85/JISH118 metacyclic promastigotes were generated and purified by the 
peanut agglutinin method as described elsewhere”. To establish the L. major 
footpad infection, female BALB/c] mice (Envigo; 6-8 weeks old; protocol number 
P11-319) were injected with a suspension of L. major metacyclic promastigotes 
(1 x 10° parasites in 50,11) into their left hind footpads. After eight days of infection, 
animals were dosed with vehicle, miltefosine (30 mg/kg once-daily), or indicated 
regimens of GNF6702 for seven days (n = 6 mice per group). The progress of 
infection was monitored by measuring the size (length and thickness) of hind 
footpad swelling using digital calipers. At the end of the study, the mice were 
euthanized, and the footpad tissues were extracted and used for genomic 
DNA isolation with the DNeasy Blood and Tissue kit (Qiagen). The L. major 
footpad burden was determined by qPCR quantification of kinetoplastid 
minicircle DNA (forward primer: 5’-TTTTACACCTCCCCCCAGTTT-3’; 
reverse primer: 5/-CCCGTTCATAATTTCCCGAAA-3’; Taqman MGB probe: 
5/-AGGCCAAAAATGG-3’, 5/-FAM (6-carboxyfluorescein)-labelled reporter 
dye, non-fluorescent quencher). The amounts of mouse chromosomal DNA 
in extracted samples were quantified in parallel qPCR using a glyceraldehyde- 
3-phosphate dehydrogenase (Gapdh) TaqMan assay as described for mouse VL 
model above. Leishmania major burden in footpad was expressed as the ratio of 
kinetoplast minicircle DNA to mouse Gapdh. P values for the between-groups 
differences in efficacies were calculated with a Student's paired t-test with a two- 
tailed distribution. 

Mouse model of Chagas disease. Compound efficacy in a mouse model of 
Chagas disease was determined as described previously”*. Female C57BL/6 mice 
(Envigo; 6-8 weeks old; protocol number P 11-316) were infected by intraperi- 
toneal injection with 10° tissue culture-derived T. cruzi CL trypomastigotes. 
Starting at 35 days after infection, the animals were dosed orally once-daily with 
100 mg/kg benznidazole (Sigma-Aldrich) and indicated doses of GNF6702 (1, 3, 
and 10 mg/kg twice-daily, n= 8 per group) for 20 days. Ten days following the 
end of drug treatment, the mice underwent four cycles of cyclophosphamide 
immunosuppression, each cycle lasting one week. During each immunosuppres- 
sion cycle, mice were dosed by oral gavage once-daily with 200 mg/kg cyclophos- 
phamide (suspension in 0.5% methylcellulose/ 0.5% Tween80 aqueous solution) 
on day 1 and day 4 of the cycle. After the fourth immunosuppression cycle, blood 
samples were collected from the orbital venous sinus of each mouse, mice were 
euthanized and heart and colon samples were collected. Samples from treated 
mice were used for extraction of total DNA using the High Pure PCR template 
preparation kit (Roche). The amounts of T: cruzi satellite DNA (195-bp fragment) 
in extracted DNA samples were quantified by real-time qPCR TaqMan assay (Life 
Technologies) with the following set of primers: AATTATGAATGGCGGGAGTCA 
(forward primer), CCAGTGTGTGAACACGCAAAC (reverse primer), 
and AGACACTCTCTTTCAATGTA (TaqMan MGB probe, 5’-FAM 
(6-carboxyfluorescein)-labelled reporter dye, non-fluorescent quencher). The 
amounts of mouse chromosomal DNA in extracted samples were quantified 
in parallel qPCR reactions using a Gapdh (glyceraldehyde-3-phosphate 
dehydrogenase) TaqMan assay as described for mouse VL model above. Each 
qPCR mixture (10,1) included 511 of TaqMan Gene Expression master mix (Life 
Technologies), 0.5 11 of a 20 primer/ probe mix (Life Technologies), and 4.5 11 
(50 ng) of total DNA extracted from blood samples. PCRs were run on the Applied 
Biosystems 7900HT instrument. Trypanosoma cruzi parasitemia was expressed as 
the abundance of T. cruzi microsatellite DNA relative to the abundance of mouse 
Gapdh DNA. 

Mouse model of stage II HAT. Female CD1 (Charles River UK; ~8 weeks old; project 
license number PPL 60/4442) mice were infected by injection into the peritoneum 
with 3 x 10" T: brucei (GVR35-VSL2) bloodstream form parasites*®. Starting on 
day 21, mice were dosed by oral gavage once-daily with GNF6702 (n= 6) at 100 mg/kg 
for 7 days or a single dose of diminazene aceturate (Sigma-Aldrich) at 40 mg/kg in 
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sterile water was administered by i.p. injection (n= 3). A group of untreated mice 
(n=3) was included as controls. 

Mice were monitored weekly for parasitemia from day 21 post-infection. 
Trypanosoma brucei was quantified in blood samples from the tail vein by 
microscopy, and in vivo bioluminescence imaging of infected mice was performed 
before treatment on day 21 post-infection and in weeks following the treatment 
(day 28, 35, 42, 56, 63, 72, 84, 92 post-infection). Imaging on groups of three mice 
was performed 10 min after i.p. injection of 150 mg p-luciferin (Promega)/kg body 
weight (in PBS) using an IVIS Spectrum (Perkin Elmer) as described previously’. 
A group of uninfected mice (aged-matched for day 0 time point; n = 4) were 
imaged using the same acquisition settings to show the background biolumi- 
nescence (Fig. 2e, grey-filled squares) in the absence of luciferase-expressing 
T. brucei after day 92 of the experiment. Untreated and diminazene-treated 
mice were euthanized on days 32 and 35, and day 42, respectively, due to high 
parasitemia or the development of symptoms related to CNS infection. GNF6702- 
treated mice were euthanized on day 92. No parasitemia or clinical symptoms were 
observed at this point. At the specified endpoints mice were sacrificed by cervical 
dislocation, after which whole brains were removed and imaged ex vivo within 
10 min after administration of 100 1l of p-luciferin onto the brain surface. Data 
analysis for bioluminescence imaging was performed using Living Image Software 
(Perkin Elmer). The same rectangular region of interest (ROI) covering the mouse 
body was used for each whole-body image to show the bioluminescence in total 
flux (photons per second) within that region. Image panels of whole mouse bodies 
are composites of the original images with areas outside the ROI cropped out to 
save space. For ex vivo brain images the same oval shaped ROI was used to display 
the bioluminescence detected for each mouse brain at the respective endpoints. 
Chemical synthesis. The detailed procedures for chemical synthesis are presented 
in Supplementary Information. 


31. Zhang, X., Goncalves, R. & Mosser, D. M. The isolation and characterization of 
murine macrophages. Curr. Protoc. Immunol. Chapter 14, Unit1 4.1 (2008). 
32. Khare, S. et a/. Utilizing chemical genomics to identify cytochrome b as a novel 


drug target for Chagas disease. PLoS Pathog. 11, e1005058 (2015). 


33. 


34. 
35. 
36. 


37. 


38. 


39. 


40. 
41. 


42. 


43. 


44. 


45. 


46. 


Buckner, F. S., Verlinde, C. L., La Flammme, A. C. & Van Voorhis, W. C. Efficient 
technique for screening drugs for activity against Trypanosoma cruzi using 
parasites expressing beta-galactosidase. Antimicrob. Agents Chemother. 40, 
2592-2597 (1996). 

Logan-Klumpler, F. J. et al. GeneDB-an annotation database for pathogens. 
Nucleic Acids Res. 40, D98-D108 (2012). 

Taylor, M. C. & Kelly, J. M. pTcINDEX: a stable tetracycline-regulated expression 
vector for Trypanosoma cruzi. BMC Biotechnol. 6, 32 (2006). 

Hariharan, S., Ajioka, J. & Swindle, J. Stable transformation of Trypanosoma 
cruzi: inactivation of the PUB12.5 polyubiquitin gene by targeted gene 
disruption. Mol. Biochem. Parasitol. 57, 15-30 (1993). 

Wirtz, E., Leal, S., Ochatt, C. & Cross, G. A. A tightly regulated inducible 
expression system for conditional gene knock-outs and dominant-negative 
genetics in Trypanosoma brucei. Mol. Biochem. Parasitol. 99, 89-101 
(1999). 
Wilk, S. & Chen, W.-E. Purification of the eukaryotic 20S proteasome. 
Curr. Protoc. Protein Sci. Chapter 21 (2001). 

Unno, M. et al. The structure of the mammalian 20S proteasome a 
resolution. Structure 10, 609-618 (2002). 

Melnick, J. S. et al. An efficient rapid system for profiling the cellular activities 
of molecular libraries. Proc. Natl Acad. Sci. USA 103, 3153-3158 (2006). 
Waters, N. J., Jones, R., Williams, G. & Sohal, B. Validation of a rapid equilibrium 
dialysis approach for the measurement of plasma protein binding. J. Pharm. 
Sci. 97, 4586-4595 (2008). 

Wang, J. & Skolnik, S. Recent advances in physicochemical and ADMET 
profiling in drug discovery. Chem. Biodivers. 6, 1887-1899 (2009). 

Kalvass, J. C., Tess, D. A., Giragossian, C., Linhares, M. C. & Maurer, T. S. 
Influence of microsomal concentration on apparent intrinsic clearance: 
implications for scaling in vitro data. Drug Metab. Dispos. 29, 1332-1336 
(2001). 

Li, C. et al. A modern in vivo pharmacokinetic paradigm: combining snapshot, 
rapid and full PK approaches to optimize and expedite early drug discovery. 
Drug Discov. Today 18, 71-78 (2013). 

Sacks, D. L. & Melby, P. C. Animal models for the analysis of immune 
responses to leishmaniasis. Curr. Protoc. Immunol. Chapter 19, Unit 19.12 
(2001). 

McLatchie, A. P. et al. Highly sensitive in vivo imaging of Trypanosoma brucei 
expressing “red-shifted” luciferase. PLoS Negl. Trop. Dis. 7, €2571 (2013). 


275A 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


.<) 
a 


ae 7 wos 

21,000 Treatment - day 1 Treatment - day 8 GNF6702 2 GNF6702 PK 

= (SS ee eee = 5 

% 10 mg kg"! g 10,000 20 mg kg" p.o. 

ad b.i.d. = 

5 F 5 @ plasma 

6 -e 3mgkg 8 

= b.i.d. r= i 

g 8 4.000 © brain 

c 1 mg kg" — 

fo} -e- fo} 

(2) i °o 

a b.i.d. ms 

3 0.3 kg" 5 

6G -o— 0-3 mg kg" i 

a a l Z b.i.d. 3 100 | | | | 4 

0 10 20 170 180 190 0 5 10 15 20 25 
Time (h) Time (h) 

Extended Data Figure 1 | Pharmacokinetic profile of GNF6702 ECs of 18+ 1.8nM and ECgg of 42 +5.6nM. Circles, means +s.d.; n =3 
in mouse. a, Time profiles of mean free plasma concentration of mice for treatment day 1; n =5 mice for treatment day 8; fraction unbound in 
GNF6702 in mouse model of visceral leishmaniasis; free GNF6702 mouse plasma = 0.063. For data points lacking error bars, standard deviations 
concentration values were predicted from measured total plasma are smaller than circles representing means. b, Time course of total GNF6702 
concentration values collected on day 1 and day 8 of treatment. concentration in mouse plasma and brain after single oral dose (20 mgkg™'); 
Dashed blue lines correspond to intra-macrophage L. donovani n=2 mice per time point; circles, measured values; rectangles, means. 
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Extended Data Figure 2 | GNF6702 clears parasites from mice infected 
with T. brucei. a, In vivo quantification of bioluminescent T. brucei in 
infected mice before and after treatment. i.p., intraperitoneal; day 21, start 
of treatment; day 28, 24h after last GNF6702 dose; day 42, evaluation of 
early parasite recrudescence in mice treated with diminazene aceturate 
(n= 3); day 42 and 92, absence of parasite recrudescence in mice treated 
with GNF6702 (n = 6). Images from uninfected mice (3 mice of 4 are 
shown) aged-matched for day 0 were collected independently using the 
same acquisition settings. Parasitemia (blue font) and whole mouse total 
flux (black font) values of each animal are shown above the image; 


» 


N.D., not detectable. Within each group the mouse numbers in yellow 

(top left in each image) refer to the same mouse imaged throughout. 
Complete sets of parasitemia and whole mouse total flux values collected 
on individual mice throughout the experiment are listed in Supplementary 
Tables 4 and 5. b, Brains from mice shown in a were soaked in luciferin 
and imaged for presence of bioluminescent T: brucei at the indicated time 
points. For three diminazene-treated mice, two images of each brain 

are shown, one at a lower sensitivity (left) and the other at a high signal 
intensity scale. 
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Extended Data Figure 3 | Structures and profiles of GNF3943 and 
GNF8000 used for selection of resistant T. cruzi lines. Leishmania 
donovani, amastigotes proliferating within primary mouse macrophages; 
T. brucei, the bloodstream form trypomastigotes; T. cruzi, amastigotes 
proliferating in 3T3 fibroblast cells; macrophage, mouse primary 
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L. donovani EC,, = 320+7.1 nM 


T. brucei EC,, = 7342.9 nM 
T. cruzi EC,, = 154412 nM 
3T3 CC,, > 20uM 
Macrophage CC,,= 1842.1 uM 
F =10% 


CL = 8.8 ml min‘ kg" 


peritoneal macrophages; ECs and CC50, half-maximum growth-inhibition 
concentration; F, oral bioavailability in mouse after administering single 
compound dose (20 mgkg') as a suspension; CL, plasma clearance in 
mouse after single i.v. bolus dose (5 mgkg~'); all ECs9 and CCsp values 
correspond to means + s.e.m. (n= 4 technical replicates). 
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. wild-type T. cruzi GNF3943-resistant T. cruzi GNF8000-resistant T. cruzi 
@ GNF6702 
A bortezomib 
~ oa ~  MG132 
Fd Fd z @ nifurtimox 
£ £ = 
= = = 
2 2 2 
aod Lo) to) 
N N N 
5 5 5 
K K K 
0 as: 0 mn 0 
0.01 0.1 1 10 0.01 0.1 1 10 0.01 0.1 1 10 
Inhibitor (uM) Inhibitor (uM) Inhibitor (uM) 
b 
wild-type T. cruzi GNF3943° T. cruzi GNF8000F T. cruzi 

GNF6702EC,, (nM) 0.15 + 0.002 5.5 + 0.016 >10 

bortezomib EC,, (uM) 0.16 + 0.006 0.12 + 0.020 0.10 + 0.007 

MG132 EC,, (uM) 0.61 + 0.015 0.76 + 0.071 0.48 + 0.052 

nifurtimox EC,, (uM) 1.0 + 0.09 1.0+0.11 2.4+0.15 
c 

T. cruzi T. cruzi T. brucei 
ectopic PSMB4“T ectopic PSMB4F24L ectopic PSMB4“T ectopic PSMB4F24t 
non-induced induced non-induced induced constitutive constitutive 
GNF6702 = (uM) 0.20+0.007 0.20+0.023 0.56 + 0.029 >10 0.018 + 0.0018 1.240.013 
bortezomib (uM) 0.46 + 0.059 0.40 + 0.057 0.45 + 0.008 0.37 + 0.015 0.00094 + 0.00005 0.0011 + 0.00026 


Extended Data Figure 4 | Mutations in proteasome (4 subunit 

confer resistance to GNF6702 in T. cruzi and T. brucei. a, Growth 
curves of wild-type, GNF3943-resistant and GNF8000-resistant 

T. cruzi epimastigote strains in the presence of increasing concentrations 
of GNF6702, nifurtimox, bortezomib and MG132; RU (relative units) 
corresponds to parasite growth relative to the DMSO control (%); for 
data points lacking error bars, standard errors are smaller than circles 


representing means; owing to limited aqueous solubility, the highest tested 
GNF6702 concentration was 101M. b, Growth-inhibition ECs9 values 

of GNF6702, bortezomib, MG132 and nifurtimox on indicated T. cruzi 
strains. c, Growth-inhibition ECs9 values of GNF6702 and bortezomib on 
T. cruzi epimastigotes and T. brucei bloodstream form trypomastigotes 
overexpressing wild-type PSMB4 or PSMB4"*". Data shown in panels a, b 
and c correspond to means + s.e.m. (m= 3 technical replicates). 
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T. brucei T. cruzi 


T. brucei EC,, (uM) 
T. cruzi EC,, (uM) 


0.01 0.1 1 10 0.01 0.1 1 10 
T. brucei IC, (uM) T. cruzi IC, (uM) 


Extended Data Figure 5 | Correlation between inhibition of parasite 
proteasome chymotrypsin-like activity and parasite growth inhibition 
by the GNF6702 compound series. ICso, half-maximum inhibition of 
indicated parasite proteasome; T. brucei ECso, half-maximum growth 
inhibition on T. brucei bloodstream form trypomastigotes; T. cruzi ECso, 
half-maximum growth inhibition on T. cruzi amastigotes proliferating 
inside 3T3 cells; data points correspond to means of 2 technical replicates; 


red circles, ICs9 > 201M; yellow circles, ICs9 > 201M and ECs9 > 25 1M; 
data for 317 analogues are shown. 
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L. donovani 
T. cruzi 

T. brucei 
H.sapiens 


L. donovani 51 
T. cruzi 51 
T. brucei 51 
H.sapiens 50 


Armano 


L. donovani 101 
T. cruzi 101 
T. brucei 101 
H.sapiens 98 


L. donovani 151 
T. cruzi 151 
T. brucei 151 
H.sapiens 140 


L. donovani 201 
T. cruzi 201 
T. brucei = 201 


H.sapiens 190 DLDNISFPKQGS 


Extended Data Figure 6 | Hypothetical model of GNF6702 binding to 
T. cruzi proteasome 34 subunit. a, Alignment of amino acid sequences 
of proteasome (4 subunits (PSMB4) from L. donovani, T. cruzi, T. brucei 
and Homo sapiens. Green, amino acid residues conserved between human 
and kinetoplastid PSMB4 proteins; blue, amino acid residues conserved 
only among kinetoplastid PSMB4 proteins; black, amino acids mutated in 
T. cruzi mutants resistant to analogues from the GNF6702 series. 

b, Surface representation of the modelled T. cruzi 20S proteasome 
structure showing relative positions of the 85 and 34 subunits. 64 amino 
acid residues F24 and 129 (coloured yellow) are located at the interface of 
the two 6 subunits. GNF6702 is depicted in a sphere representation bound 


into a predicted pocket on the 34 subunit surface with carbon, nitrogen, 
oxygen and hydrogen atoms coloured magenta, blue, red and grey, 
respectively. The other T: cruzi 20S proteasome subunits are coloured 
grey. c, Close-up of the 85 and 84 subunits. The 85 subunit active site 
(pocket 1, chymotrypsin-like activity) is coloured pale green. The 
predicted (4 pocket (pocket 2) with bound GNF6702 is coloured blue. 
The inhibitor is shown in a stick representation with atoms coloured as 
described in caption for b. 84 residues F24 and 129 are coloured yellow. 
The proteasome model shown in b and ¢ was produced in the PyMol 
Molecule Graphics System, Version 1.8, Schrodinger, LLC. 
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a b 
T. cruzi epimastigotes 3T3 cells 
GNF6702 (uM) GNF6702 (uM) 
1 03 0.1 0.03 0.01 0.0037-00% D 10 1 0.1 3 0-01 
kDa 
198 — - 198 — 
98 — 98— 
62— 62— be 
49— 49— 
o- 
38 — 38— 
28— 28— 
17— 17— 
bortezomib (uM) bortezomib (uM) 
; 4 1 1 1 .001 
1 0. 0.1 03 2-91 9 9930-001 D 0. 3 00 0.003 2:9 D 
kDa 
198 — S & 
=| fF t 
— 
> = 
49— 
38— 
28— 
7T-—- _ 47— 
c 
T. cruzi 3T3 
GNF6702_ EC,, (uM) 0.13 + 0.010 >10 
bortezomib EC, (uM) 0.062 + 0.001 0.040 + 0.008 


Extended Data Figure 7 | Effect of GNF6702 on accumulation of proteins in T. cruzi and 3T3 cells (means + s.e.m.; n = 3 technical 
ubiquitylated proteins by T. cruzi epimastigotes and 3T3 cells. replicates); total ubiquitin signal values in individual blot lanes shown 

a, Western blot analysis of T. cruzi whole-cell extracts with anti-ubiquitin in a and b were quantified and used for calculation of the listed ECs 
antibody after treatment with GNF6702 and bortezomib. b, Western blot values. In a and b, numbers above the blot lanes indicate compound 
analysis of 3T3 whole cell extracts with anti-ubiquitin antibody after concentrations and D indicates control, DMSO-treated cells. For western 
treatment with GNF6702 and borteomib. c, Concentrations of GNF6702 blot source data, see Supplementary Fig. 1. 

and bortezomib effecting half-maximum accumulation of ubiquitylated 
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Extended Data Table 1 | Point mutations identified by whole-genome sequencing in GNF3943- and GNF8000-resistant T. cruzi 
epimastigotes 


Gene ID GNF3943" mutant GNF8000* mutant 
Number of reads (clone 1/ clone 2) 78x10°/ 63x10° 48x10°/ 68x10® 
Mapped reads (clone 1/ clone 2) [%] 87/ 87 90/ 90 
Average genome coverage (clone 1/ clone 2) 82x/ 66x 51x/ 66x 
Proteasome beta 4 subunit 3540409 129M/ 129M wt/ F24L 
Dynein heavy chain 3548195 wt/ P82L wt/ wt 
Trans-sialidase 3542504 wt/ wt wt/ G90E 
Trans-sialidase 3542504 wt/ wt wt/ L93P 
Hypothetical protein TCSYLVIO_005989 3547397 wt/ wt wt/ L627P 
Hypothetical protein TCSYLVIO_005986 3547401 wt/ wt wt/ S55P 
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Extended Data Table 2 | Enzyme inhibition ICs59 values of bortezomib and GNF6702 on three proteolytic activities of wild-type T. cruzi, 
PSMB4!2°" T, cruzi and H. sapiens proteasomes 


GNF6702 IC,, (uM)* bortezomib IC,, (uM)* 


chymotrypsin 0.035 + 0.0013 0.091 + 0.0075 
wild type T. cruzi proteasome caspase >10 0.37 + 0.012 
trypsin >10 1.7 + 0.088 
chymotrypsin > 10 0.26 + 0.040 
PSMB4”" T. cruzi proteasome caspase >10 0.544 0.012 
trypsin > 10 1.6 + 0.058 
chymotrypsin >10 0.030 + 0.0070 
H. sapiens constitutive proteasome caspase >10 0.16 + 0.007 
trypsin >10 7.9 + 0.15 


*mean+s.e.m.; n=3 technical replicates. 
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Extended Data Table 3 | Inhibition kinetics parameters of GNF6702 on L. donovani and T. cruzi proteasomes 


L. donovani 


Ki + s.e.m. (uM) 0.055 + 0.006* 
Km + s.e.m. (uM) 
Mode of inhibition non-competitive 


R? (goodness of fit) 


chymotrypsin-like activity caspase-like activity trypsin-like activity 
T. cruzi L. donovani T. cruzi L. donovani T. cruzi 
0.079 + 0.003* > 10 > 10 > 10 >10 
2.6 + 0.15* N.A.f N.A.t N.A.t N.A.t 
non-competitive N.A.t N.A.t N.A.T N.A.TF 
0.97 N.A.t N.A.t N.A.t N.A.t 


*mean+s.e.m.; n=3 technical replicates. 


tnot applicable 


3.6 + 0.60* 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


doi:10.1038/nature19334 


Germinal centre hypoxia and regulation of antibody 
qualities by a hypoxia response system 


Sung Hoon Chol, Ariel L. Raybuck!, Kristy Stengel**, Mei Wei!*, Thomas C. Beck!, Emmanuel Volanakis*, James W. Thomas!*, 


Scott Hiebert?>*, Volker H. Haase**°” & Mark R. Boothby!*+>° 


Germinal centres (GCs) promote humoral immunity and vaccine 
efficacy. In GCs, antigen-activated B cells proliferate, express high- 
affinity antibodies, promote antibody class switching, and yield B 
cell memory”. Whereas the cytokine milieu has long been known 
to regulate effector functions that include the choice of 
immunoglobulin class*“, both cell-autonomous’ and extrinsic®” 
metabolic programming have emerged as modulators of T-cell- 
mediated immunity*. Here we show in mice that GC light zones are 
hypoxic, and that low oxygen tension (p,,) alters B cell physiology 
and function. In addition to reduced proliferation and increased 
B cell death, low py, impairs antibody class switching to the pro- 
inflammatory IgG2c antibody isotype by limiting the expression of 
activation-induced cytosine deaminase (AID). Hypoxia induces HIF 
transcription factors by restricting the activity of prolyl hydroxyl 
dioxygenase enzymes, which hydroxylate HIF-1a and HIF-2a to 
destabilize HIF by binding the von Hippel-Landau tumour 
suppressor protein (pVHL)’. B-cell-specific depletion of pVHL leads 
to constitutive HIF stabilization, decreases antigen-specific GC 
B cells and undermines the generation of high-affinity IgG, 
switching to IgG2c, early memory B cells, and recall antibody 
responses. HIF induction can reprogram metabolic and growth 
factor gene expression. Sustained hypoxia or HIF induction 
by pVHL deficiency inhibits mTOR complex 1 (mTORC1) activity 
in B lymphoblasts, and mTORC1-haploinsufficient B cells 
have reduced clonal expansion, AID expression, and capacities to 
yield IgG2c and high-affinity antibodies. Thus, the normal 
physiology of GCs involves regional variegation of hypoxia, and 
HIF-dependent oxygen sensing regulates vital functions of B cells. 
We propose that the restriction of oxygen in lymphoid organs, 
which can be altered in pathophysiological states, modulates 
humoral immunity. 

The micro-anatomy of secondary lymphoid organs and rapid 
proliferation of activated lymphocytes in them? prompted testing for 
hypoxia. Using flow cytometry, HIF levels were found to be increased 
in GC-phenotype B (GCB) cells compared to other B cells in the spleens 
of immunized mice (Fig. 1a; Extended Data Fig. 1a). Immunofluorescent 
microscopy revealed that HIF was most increased in GCs (Fig. 1b; 
Extended Data Fig. 1b). HIF is induced under low oxygen. However, 
HIF-1o and HIF-2« subunits can be stabilized at normoxic po, 
(ref. 10), so we used chemical probes to mark hypoxic cells in vivo. 
Spleen, lymph nodes and Peyer’s patches were analysed after injection 
of pimonidazole or EF5 (ref. 11) and staining with antibody that binds 
the adducts (Fig. 1c-e; Extended Data Fig. 1b-h). Fluorescence denoting 
hypoxia localized predominantly to the GC and the signal for each agent 
was weaker in the IgD* zone’. Flow cytometry detected EF5 only with 
GL7* GCB cells (Fig. le), and a hypoxia-related gene signature was 
enriched in GCB cells (Extended Data Fig. 1i). The EF5 and 


pimonidazole signals only partially filled GCs, which are subdivided 
into light and dark zones between which B cells cycle iteratively to 
promote high-affinity antibodies. EF5 labelling predominantly 
overlapped a follicular dendritic cell marker (CD35) restricted to the 
light zone (Fig. 1f). B lymphoblasts proliferate rapidly in the dark zone, 
whereas cell cycling decreases in the light zone!. The most EF5-positive 
GCB cells had entered S-phase at lower rates (percentage BrdU‘) 
(Fig. 1g, h) and more frequently activated an executioner caspase 
(Fig. li). Thus, activated B cells experience hypoxia in GCs, 
predominantly in their light zones. Notably, the more hypoxic GCB cells 
proliferated less and had increased apoptotic signalling. 

To test the effect of hypoxia on antibody class switching, activated 
B cells cultured in hypoxia (pg, of 1%) were compared to controls 
cultured at atmospheric (~21%) or venous (5%) Po,, using conditions 
that promote IgG] or the pro-inflammatory isotype IgG2c (Fig. 2a; 
Extended Data Fig. 2). Hypoxia restricted B cell population growth 
(Fig. 2a, b), with increased caspase-3 activation and lower BrdU incor- 
poration (Extended Data Fig. 2a, b). Thus, O2 sufficiency promoted 
B cell proliferation by both improving survival and increasing cell 
cycling. These effects were paralleled by an altered balance in cell 
metabolism, as hypoxia promoted a higher glycolytic rate (Extended 
Data Fig. 2c) in activated B cells. Conversely, in vitro inhibition 
of prolyl hydroxyl dioxygenase (PHD) reduced O, consumption, and 
gene expression profiling of fresh ex vivo B cells showed major 
differences between the non-GC and GC subsets (Extended Data 
Fig. 2d, e, respectively). Moreover, IgGt B cell frequencies were reduced 
at 1% Po, (Fig. 2a, Extended Data Fig. 2f). The enteric immune system 
is a site of physiological hypoxia!’; notably, hypoxia did not decrease 
the frequency of IgA* B cells in IgA-promoting conditions (Fig. 2a, 
Extended Data Fig. 2f). Switching requires multiple B cell divisions’. 
When fluorescein partitioning was analysed along with switching to 
IgG2c, hypoxia reduced switching by B cells at the same division 
number (Fig. 2b). Thus, hypoxia at levels of the GC light zone altered 
antibody class switching by a direct influence on class choice in addition 
to reducing proliferation and reprogramming B cell metabolism 
and survival. 

Class switch recombination is executed by AID, which is encoded by 
the Aicda gene!*. In IgG switch conditions, Aicda mRNA and AID 
protein were reduced by hypoxia (Fig. 2c, d; Extended Data Fig. 2g). By 
contrast, AID was not reduced by hypoxia in IgA switch conditions 
(Fig. 2d). Switch recombinase is directed to the immunoglubulin heavy 
chain regions by transcription factors that create accessibility marked 
by germ-line transcripts (GLTs)**. Hypoxia decreased induction of the 
transcription factor T-bet and the T-bet-dependent Iy2c GLT™ 
(Fig. 2e, f), whereas Rora mRNA and the Ia GLT were not reduced in 
B cells at reduced Po, (Fig. 2e, f). The PHD inhibitor dimethyloxalylg- 
lycine (DMOG) reduced proliferation and increased apoptosis of B cells 
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Figure 1 | Hypoxia in GC light zones. a, Flow cytometry of HIF-1a in 
GCBs (GL7* B220* gate) from sheep red blood cell (SRBC)-immunized 
mice, and in the GL7~ B220* gate, compared to controls (rlgG1 instead 

of primary anti-HIF-1« antibody; Extended Data Fig. 1a, Hifla*/* 

B cells stained with anti-HIF-1q). b, Left, immunofluorescent staining of 
HIF-1a or controls (rlgG1; as in a) in GCs (GL7* IgD~) and surrounding 
follicles (IgD+ GL7~) (n= 12 GCs, 4 spleens in 2 experiments). Scale bar 
inb, c, f, 100 1m. Right, HIF-1a signals quantified within GCs compared 
to the GL7~ follicular cells (Extended Data Fig. 1b, Hifla“/* B cells stained 
with anti-HIF-1a). AU, arbitrary units. c-e, GC hypoxia. Adducts, IgD 
and GL7 were stained after immunized mice were injected with EF5, 
pimonidazole (hypoxyprobe) or PBS (Veh.). c, Anti-pimonidazole staining 
of spleen sections (representative of 24 GCs in 9 sections from 

3 independent experiments, quantified in Extended Data Fig. 1c). 

d, Quantified EF5 signals (Extended Data Fig. 1d) within GCs compared 
to the GL7~ follicles, as in b (n= 19 GCs from n=5 mice each condition, 
PBS and EF5; 3 independent experiments). e, A representative flow 
cytometry result (n =3 experiments) with anti-EF5 staining of spleen 
cells after intravital injection with EF5 or PBS, as in c and d, gated 

as in a. f, Hypoxia maps mostly to the light zone. Spleen sections as in 

e, stained for CD35, GL7 and EF5, and anti-EF5 signals in CD35* and 
CD35~ regions, quantified as in d. g, Flow cytometric measurements 

of S-phase (BrdU*) GCB cells that were either hypoxic (EF5") or not 
(EF5'°), from mice as in e after BrdU injection. h, BrdU incorporation 
(n=7 samples in two independent experiments). i, Fractions of cleaved 
(activated) caspase 3-positive (CC3*) GCB cells, gated as in g. All data are 
mean — s.e.m. 


cultured at 21% Po, and severely restricted switching to IgG2c, whereas 
switching to IgA exhibited less impairment (Extended Data Fig. 3a, b). 
An inhibitor of HIF stabilization mitigated the reduction of IgG2c- 
switched B cells by low oxygen (1% Po,) (Extended Data Fig. 3c). Akin 
to hypoxia, PHD inhibition and HIF stabilization impaired AID, T-bet, 
and Iy2c GLT induction in the presence of the IgG2c switch cytokine 
IFN‘ (Fig. 2c, e, f; Extended Data Fig. 4a-c). By contrast, levels of RNA 
for RORaand the Ia GLT were higher in DMOG-treated cells than in 
controls (Fig. 2c). Thus, hypoxia reduced AID and GLT induction in 
the conditions promoting IgG2c, whereas Ia and AID levels were 
maintained in IgA conditions, consistent with the relative effects on 
class-switched B cell antigen receptors (BCRs). 

pVHL destabilizes HIF by targeting hydroxylated alpha subunits 
for rapid proteasomal degradation in most oxygen-sufficient 
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Figure 2 | Hypoxia regulates B cell survival, proliferation and class 
switching. a, O2 modulates the spectrum of antibody isotypes. Surface 
IgG1, IgG2c and IgA on B220*-gated cells, measured by flow cytometry 
after activation of purified B cells culture at Po, of 21% (normoxia), 

5% or 1% (hypoxia) using conditions promoting IgG1, IgG2c or IgA. 
Flow cytometry data from one representative experiment (top) along with 
bar graphs showing aggregate results of cell numbers and switch 
efficiencies (bottom) (n= 4 for 5%, n=7 for 1% and 21% Po,). 

LPS, lipopolysaccharide. b, Flow cytometry of surface IgG2c (right) on 

B cells gated by division number (left) after activation of CellTrace 

Violet (CTV)-stained B cells and culture with IFN‘ as in a. Inset 
numbers (bold font) denote the percentage of switched B cells at 
indicated division numbers in this analysis; mean (+ s.e.m.) values 

from the independent replicate experiments (= 3) are italicized. 

Shaded overlay: CTV fluorescence of undivided cells cultured only in 
BAFE c,d, AID regulated by oxygen sufficiency. c, Aicda mRNA was 
quantified in B cells activated and cultured as in a, or in the presence or 
absence of PHD inhibitor DMOG. d, Relative AID expression, measured 
as GFP fluorescence in AID-GFP transgenic B cells stained with CTV, 
activated and cultured in the conditions of a and b. Representative GFP 
fluorescence versus divisions for B220* cells quantified from four 
independent replicate analyses are in Extended Data Fig. 2g. e, f, Hypoxia 
and PHD inhibition reduce T-bet and Iy2c GLT induction, but not Rora or 
Ia. Iy2c GLT (e) and Tbx21 (f) mRNA measured after B cell cultures in 
IgG2c conditions; Ia GLT (e) and Rora (f) mRNA in B cells cultured for 
IgA switching (n = 3-4 experiments). BLD, below limit of detection. Data 
are mean + s.e.m. 


environments”!*. To model persistent hypoxic signalling in vivo, 
we used conditional Vhl loss-of-function experiments. Mature 
B cells subjected to VhI deletion yielded less antigen-binding GCB 
cells after immunization, less IgG2c antibodies, and a substantial 
decrease in cells secreting antigen-specific IgG2c in primary responses 
(Fig. 3a—c, Extended Data Figs 5, 6). Cycling between the light and dark 
zones in GCs promotes higher affinity antibodies'®, so it was notable 
that for IgM and IgG1 pVHL depletion only impaired generation of 
high-affinity anti-4-hydroxy-3-nitrophenylacetyl (NP) antibodies 
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Figure 3 | B cell-intrinsic role of pVHL in antibody response qualities. 


a-c, In adoptive transfer experiments (schematic diagram, Extended Data 
Fig. 5a), B cells purified from tamoxifen-treated mice were transferred into 
recipients after mixing with CD4* T cells (polyclonal:OVA-specific OT-II 
cells = 4:1). Recipients were analysed after primary immunization, or, for 
memory responses (Extended Data Fig. 5d), after the primary and a recall 
immunization. a, b, VHL reduction causes HIF-dependent alterations in 
antibody responses. a, Primary NP-specific IgG2c antibody response in 
Rag’ recipients of wild-type (WT) or VhI';ER™-Cre (Vhl cKO) B cells 
from tamoxifen-treated donor mice (n=5 recipients of each genotype, 
distributed evenly between two independent replicate experiments). Other 
antibody isotypes are in Extended Data Fig. 5. b, High-affinity (NP2) or 
all-affinity (NP29) anti- NP antibodies of the indicated isotypes in sera 
from immunized recipients, measured by ELISA. Each dot represents one 
mouse (1 = 9 of each genotype, distributed evenly among 3 independent 
experiments). Horizontal lines denote the mean. WT, wild type; Vh! cKO, 
pVHL-depleted conditional knockout (VhI4/4); V;H1;H2 cKO, 

pVHL-, HIF-1a- and HIF-2a-depleted conditional knockout (VhI4/* 
Hifla*/“ Epas1*/4). c, HIF-dependent reduction of antigen-specific 

B cell populations. Flow cytometry results scoring NP-binding GCB 

cells (B220* GL7* IgD~) and early memory (B220* CD38* IgM* GL7~ 
IgD~) phenotypes. One representative result from the same mice and 
experiments as shown in b (Extended Data Fig. 6c, d). d, VHL in B cells 
promotes Aicda and Tbx21 expression. Wild-type and VhI*/* B cells were 
activated, cultured and analysed as in Fig. 2c. e, B cells transduced with 
MIT, MIG, MIT-T-bet or pMx-GFP-AID retrovectors were cultured with 
BAFF and LPS + IFN‘ in the presence or absence of DMOG. EV, empty 
retrovector. Frequencies of surface IgG2ct events among B220* cells 
analysed 4 days after transduction, with flow data from one experiment 
in Extended Data Fig. 4e (n = 3 independent experiments). Data are 
mean = s.e.m. 


(Fig. 3b). The defect in primary responses substantially reduced IgG2c 
of all affinities (Fig. 3b, Extended Data Fig. 6a), whereas antigen- 
specific IgA was unaffected (Extended Data Fig. 6b). The effects of 
pVHL depletion on IgG2c and high-affinity IgG1 antibody responses 
were HIF-dependent (Fig. 3b). Defects of antibody responses were 
heightened in recall (secondary) immunity when compared to primary 
responses (Extended Data Fig. 5c, d compared to Fig. 3a). pVHL loss 
reduced the population of antigen-binding memory B cells, an effect 
mitigated by concomitant HIF depletion (Fig. 3c, Extended Data 
Fig. 6d). Aicda mRNA induction in activated B cells was impaired in 
cells with increased HIF due to reduced Vil (Fig. 3d, Extended Data 
Fig. 4c, d). Tbx21 mRNA and T-bet protein levels were also lower in 
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Figure 4 | mTORC1 activity in B cells regulates antibody qualities but is 
attenuated by hypoxia. a, Immunoblots of lysates prepared from activated 
B cells cultured overnight at 21% or 1% Po,, before (—) and after (+) re- 
stimulation with anti-IgM. b, Immunoblots of B cell extracts as in a, using 
conditionally pVHL-depleted cells with either normal (VhI cKO) or 
deficient (V;H1;H2 cKO) HIF expression. c, d, Raptor promotes generation 
of high-affinity antibodies and switch to IgG. IgH? (donor B-cell-derived)- 
allotype anti-NP antibodies were measured after immunization of IgH* 
allotype mice that had received wild-type or Rptor*' B cell transfers. 
ELISA results for all-affinity anti- NP IgG in primary response sera from 
recipient mice (n=9 WT, n=8 Rptor*! ), captured on NP» (c), and high- 
affinity antibodies (IgM, 1:100; IgG1, 1:50) captured on NP) (d). IgG2c was 
undetectable, as in c. e, f, mTORC1 promotes AID expression. GFP in 
B220+-gated cells by flow cytometry after B cells were cultured for 4 days 
with LPS, BAFF and IL-4 or IFN-yas indicated. e, Rptor*/*+ (WT) or 
Rptor*'* AID-GFP transgenic mice. f, Rptor*/+ AID-GFP cells cultured 
in the presence of mTORC1 inhibitor rapamycin (Rapa; 10 nM) or vehicle. 
Data are mean +s.e.m.; in e, f, n = 3, 4 replicates, respectively. 


pVHL-depleted B cells (Fig. 3d, Extended Data Fig. 4c). To test the 
impact of decreased AID and T-bet, we forced expression of these 
proteins in activated B cells. T-bet did not increase the frequency of 
IgG2c-positive B cells during PHD inhibition, although it bypassed 
the need for IFNy with control B cells (Fig. 3e, Extended Data Fig. 4e). 
By contrast, forcing AID expression normalized switching in these 
assays (Fig. 3e, Extended Data Fig. 4e). We conclude that the PHD/HIF/ 
VHL axis regulates the qualities of antibody responses, with modulation 
of AID levels as a major mechanism for hypoxic influence on the 
Ig class preferences. 

B cell activation, class switch recombination, and development into 
antibody-secreting cells are effected by receptors that stimulate mTOR. 
Hypoxia and HIF-1 have been shown to either inhibit or enhance 
mTORC1 activity in tumour or endothelial cells'”!®. In hypoxic and 
DMOG-treated B cells, BCR engagement elicited less phosphorylation 
of proteins downstream from mTORC1 (Fig. 4a, Extended Data 
Fig. 7a). Depletion of pVHL also reduced BCR-stimulated mTORC1 
by a HIF-dependent mechanism (Fig. 4b). Thus, hypoxia restrained 
mTORCI1 in normal B cells. In vitro experiments suggest that HIF- 
mediated limitation of increased amino acid transport contributes to 
this effect. B cell activation increased leucine uptake and expression of 
transporters used for nutrient uptake; HIF stabilization impaired this 
induction (Extended Data Fig. 7b-e). Moreover, adequate supplies of 
leucine were crucial, and partially sufficient, for BCR re-activation of 
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mTORC1 in B lymphoblasts (Extended Data Fig. 7f). HIF depletion 
did not completely restore either the antibody response or amino acid 
uptake to normal in pVHL-deficient B cells. However, two additional 
mechanisms previously shown to suppress mTORC1 were evoked 
in hypoxic B cells in vitro—steady-state ATP pools were halved, 
accompanied by increased AMPK activity, and expression of the Redd1 
(also known as Ddit4) gene was increased (Extended Data Fig. 8a—c). 

Disruption of mTOR function by means that impair both mTORC2 
and mTORC1 altered the balance between class-switched and IgM 
antibody against specific antigen'®”°. By contrast, HIF stabilization 
only partially inhibited mTORC1 and spared mTORC2 (Extended Data 
Fig. 8d, e). Accordingly, we tested whether partially reduced mTORC1 
activity effects high-affinity antibody production, proliferation, AID 
levels, or biases of Ig class switching using disruption of Rptor, which 
encodes a protein essential for mTORCI (ref. 21). Rptor haploinsuffi- 
ciency in B cells reduced mTORC1 activity (Extended Data Fig. 9a), 
and yielded results of in vitro switching and humoral responses 
in vivo (Fig. 4c, d, Extended Data Fig. 9) similar to those obtained 
with hypoxia and the PHD/HIF/VHL axis. IgG2c reductions were 
more substantial than those of IgM or IgG1 (Fig. 4c), and NP-specific 
GCB cells and IgG2c anti-NP-antibody-secreting cells (Extended Data 
Fig. 9b-d) were reduced. Partial mTORC1 loss reduced switching to 
IgG2c (Extended Data Fig. 10a) and suppressed high-affinity IgG1 
antibody production (Fig. 4d). IgG1 switch conditions promoted 
higher expression of a tracking allele, green fluorescent protein 
(GFP)-tagged AID, which was partially reduced by Rptor hemizygosity 
(Fig. 4e), whereas IgG2c conditions led to less AID in control cells 
and a greater reduction in Rptor*/“ B cells. Moreover, Rptor haplo- 
insufficiency led to reduced T-bet expression, and decreased mRNA 
levels of both Tbx21 and Aicda in activated B cells (Extended Data 
Fig. 10b, c). Pharmacological inhibition of mTOR with rapamycin 
substantially reduced AID levels'®”° (Fig. 4f) and switching to IgG2c, an 
effect mitigated by forced AID and T-bet expression (Extended 
Data Fig. 10d-f). Overall, localized hypoxia and HIF induction are 
normal features of GC microphysiology that modulate the output 
from lymphoid follicles, effects similar to those of restricting 
mTORC1 activity. 

Low oxygen tension confronts B cells in GCs during an immune 
response. The findings reveal that restricted oxygen supply or persistent 
induction of HIF transcription factors in B cells limits proliferation, 
isotype switching, and levels of high-affinity antibodies. GCB cells 
undergo iterative selection to enhance antibody affinity!” so that 
the most suitable B cells survive, further mature, and continue to 
multiply. Thus, the restriction of py, of the GC may slow proliferation 
and set a more stringent threshold for crucial survival signals. In 
addition, the IgG2c isotype has particular functions in anti-microbial 
responses and inflammation owing to the affinities of its constant 
region with the spectrum of Fc receptors on cells”. Many patients with 
hypoxaemic lung disease exhibit lower serum IgG levels and heightened 
susceptibility to respiratory infection?*. Hypoxia has also been 
recognized as a major aspect of inflammation in disease states. 
Intratumoral restrictions of oxygenation elicit indirect effects on 
immune function in cancer and may also act directly on T cells”*”°. 
Moreover, hypoxia and neo-lymphoid tissue or tertiary lymphoid 
structures with GCs, plasma cells, and local antibody production are 
now recognized in a wide range of inflammatory settings in which the 
oxygen landscape is unexplored”®. The hypoxia response program in 
intestinal epithelial cells limits local inflammation!?”””®, providing 
counter-regulation against activated neutrophils””. Analogous to this, 
the susceptibility of IgG2c to hypoxia may represent another means for 
limiting pathology from unchecked inflammation in normal 
immunity. 

Online Content Methods, along with any additional Extended Data display items and 


Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 

Mice and B cell transfer models. Mice (C57BL/6 mice, CD45.1 congenic, Ig Cy 
allotype-disparate (IgH”), Rag’, AID-GFP Tg, pVHL conditional knockout (Var, 
ER™?-Cre)”’, pVHL; HIF- 10; HIF-2c triple conditional knockout (Vhl"",Hifla™; 
Epas1™ ERT _Cre), and raptor conditional knockout (Rptor"™! ER™_Cre*)) were 
housed in ventilated micro-isolators under specified pathogen-free conditions 
in a Vanderbilt University mouse facility and used at 6-8 weeks of age following 
approved protocols. Healthy mice of useful genotype were randomly selected 
for the experiments, without preference to size, gender, or other potential 
confounding factor. All figures are based on data reproduced in independent 
biological replicates, typically conducted weeks or months apart in time and 
involving different cages of donor and recipient mice, and always with parallel 
handling and manipulation of the mice and cells of samples to be compared. For 
adoptive transfer experiments, B cells (from 1-2 donor mice of each genotype) 
were purified by depleting T cells using biotinylated anti-Thy1.2 antibody followed 
by streptavidin-conjugated microbeads (iMag; BD Biosciences). Pooled wild-type 
CD4¢ T cells and OT-II CD4* T cells (4 x 10° and 1 x 10° cells per recipient, 
respectively, typically from two donor mice of each background) were purified 
by positive selection with L3T4 anti-CD4 microbeads and, in adoptive transfers 
into Rag’ or Ig Cy allotype-disparate (IgH*) mice, mixed with pools of wild-type, 
VIA, VhIA!A;Hifla!4;Epas1™4, or Rptor*/* B cells (5 x 108 cells per recipient) 
and injected intravenously (i.v.) into Rag® or IgH* recipients. Recipient mice of 
similar ages (6-8 weeks) were randomly selected for the experiments, without 
preference to size or gender. Experiments using the conditional Vhlalleles (VhI*/4) 
were designed to avoid distortions rapidly imposed by systemic pVHL loss (for 
example, extra-medullary haematopoiesis”’). Those using Rptor”* drove excision 
with the same Rosa26;ER'?-Cre allele and with tamoxifen-initiated Cre activity so 
as to be more directly comparable to the Vil experiments and because of distortions 
of B cell development observed even for heterozygotes with mb1-Cre (deletion at 
outset of B lymphoid ontogeny) (A.L.R. and M.R.B., unpublished observations). 
Reagents. IFNy, IL-4 and monoclonal antibody (purified, biotinylated or 
fluorophore-conjugated) were from BD Pharmingen or Tonbo Biosciences 
unless otherwise indicated. IL-5 was from Peprotech, TGF3 and BAFF were 
from R&D Systems. NP-BSA (for capture ELISA), NP-OVA (4-hydroxy-3- 
nitrophenylacetyl hapten conjugated to ovalbumin, cat no. N-5051-100), and 
NP-O-succinimide (4-hydroxy-3-nitrophenylacetic acid active ester, or NP-Osu, 
for NP-allophycocyanin (APC) conjugation; cat no. N-1010) were obtained 
from Biosearch. SRBCs (sheep red blood cells), p-glucose, and 2-deoxyglucose 
were from Thermo Fisher Scientific. Tamoxifen, 4-hydroxy-tamoxifen, chicken 
ovalbumin, all-trans retinoic acid and LPS were from Sigma-Aldrich Chemicals. 
DMOG (HIF-hydroxylase inhibitor dimethyloxalylglycine, Calbiochem cat no. 
400091) and oligomycin were from EMD Millipore. Fluorescent proteins APC 
and (R)-phycoerythrin (rPE; Prozyme) were used for conjugation reactions with 
NP-O-succinimide to generate fluor-conjugated NP. 

Immunohistochemistry, flow cytometry and detection of hypoxia. C57BL/6 
mice were immunized with SRBCs (2 x 10° cells per mouse). At 1 week after 
immunization, mice were injected with EF5 (ref. 31) or pimonidazole HCl 
(hypoxyprobe). Spleen, lymph node and Peyer's patches were embedded in OCT 
reagent and snap frozen on dry ice. Sections of frozen tissue were fixed with 
4% paraformaldehyde, permeabilized with 0.5% Triton X-100 in PBS, blocked 
with M.O.M. (Vector Laboratory) followed by incubation with GL7-FITC, IgD- 
PE, and anti-EF5-Cy5 antibodies at 4°C. For hypoxyprobe detection, frozen 
sections were stained with biotinylated anti-pimonidazole antibody followed by 
streptavidin-conjugated Alex647 antibody. Biotinylated anti-CD35 antibody (BD 
Pharmingen, clone 8C12) followed by streptavidin-conjugated phycoerythrin 
(PE) antibody was used for indirect immunofluorescent detection of follicular 
dendritic cells (FDC). Quantification of HIF-1a, EF5 and hypoxyprobe fluorescent 
intensity within GCs (total or light zone as defined by CD35 staining) and follicular 
regions was performed using MetaMorph Image processing software. For the 
samples and negative controls, the regions were quantified in toto using the signal- 
averaged fluorescence intensity within each boundary (for example, CD35* or 
GL7*) After subtracting the background mean fluorescence intensity (MFI) from 
negative control samples, MFI values of HIF-1a, EF5 and hypoxyprobe within GCs 
(CD35* and CD35~ zones) and GL7~ follicular region were obtained. Data are 
presented as mean (+ s.e.m.) MFI values for the individual samples (n > 20 GCs 
for each condition, drawing evenly on three independent experiments). In flow 
cytometric detection of hypoxic cells, BrdU incorporation, or cleaved caspase 3, cell 
surface markers were stained by fluor-conjugated monoclonal antibody, followed 
by fixation (4% paraformaldehyde), permeabilization with saponin (0.2%), and 
staining with anti-EF5-Cy5, or two-step staining of pimonidazole according to 
supplier's instructions. BrdU and cleaved caspase 3 were detected as described*”. 
For these and other flow cytometric analyses, fluorescence emission data on cell 
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suspensions were collected on BD LSR or FACSCalibur flow cytometers driven 
by BD FACS Diva software, then processed using Flow-Jo software (TreeStar). 
Immunizations, and measurements of antibody responses. After collection of 
pre-immune sera, mice were immunized with NPj.-ovalbumin (OVA) (100p.g 
intraperitoneally) in alum (Imject, Thermo Fisher Scientific) as described**. 
Alternatively, this primary immunogen was mixed with NP-modified SRBCs, 
followed by a boost with NP-OVA in alum. Relative levels of anti-NP antibodies 
in immune sera were assayed by ELISA on serial dilutions binding to either NP29- 
BSA (high valency, to capture all affinities of antibodies) or NP» (low valency, 
to restrict binding to the high-affinity antibody). Specific classes or isotypes 
were then detected using the series of isotype-specific second antibody of the 
SBA Clonotyping System (Southern Biotech), as described*’. Data for antigen- 
specific antibodies are shown after subtraction of low absorbance (A) values 
from pre-immune controls analysed together with the immune sera and were 
separately determined to match values yielded by titration. Antibody-secreting 
cells were analysed by ELISpot as previously described** and quantitated using 
an ImmunoSpot Analyzer (Cellular Technology). Antigen-specific B cells were 
detected and enumerated using flow cytometry to score B lineage-marked cells 
binding to fluor (APC or rPE)-conjugated NP, using a dump channel (7-AAD and 
APC-conjugated monoclonal antibody against IgD, F4/80, Grl, CD11b, CD11c, 
CD4, and CD8) to exclude non-specific signal. 

Gene expression profiling. Mice were injected with SRBCs and euthanized 
10 days after immunization. Single-cell suspensions from spleens were stained with 
anti-B220 (RA3-6B2) and anti-GL7. B220+GL7~ and B220*GL7* splenocytes 
were sorted with TRIzol reagent (Ambion). Total RNA was isolated from 
biological replicates and provided to the Vanderbilt VANTAGE shared resource 
for library construction and sequencing. Briefly, libraries were constructed from 
poly-adenylated RNAs and sequenced with an Illumina HiSeq 2500 on an SR-50 
run aiming for 30M reads/sample. Reads were aligned to the mm10 mouse 
transcriptome using TopHat and differential gene expression was determined 
using Cuffdiff as previously described*4. Gene set enrichment analysis (GSEA) 
was performed using software available from the Broad Institute (http://www. 
broadinstitute.org/gsea), which tested for enrichment based on hypergeometric 
distribution with respect to published gene signatures. For hypoxia regulated gene 
signature, GSEA plots comparing a gene set pre-ranked by log, fold change in 
gene expression (GL7*B220* versus GL7~B220*) to a hypoxia signature published 
previously*® were generated. A significant enrichment was defined as having a false 
discovery rate (FDR) q<0.05. 

In vitro B cell cultures for class-switched antibody production. Splenic B cells 
were purified (90-95%) by depleting T cells using biotinylated anti-Thy1.2 
monoclonal antibody followed by streptavidin-conjugated microbeads. For IgG1, 
B cells (0.5 x 10° cells ml!) were activated with LPS or F(ab’), anti-IgM (Southern 
Biotechnology) and anti-CD40 (BD Pharmingen), cultured with BAFF and IL-4. 
For IgG2c, B cells (0.5 x 10° cells ml") were activated with LPS or anti-IgM and 
anti-CD40, cultured with BAFF and IFN. For IgA, B cells (0.5 x 10° cells ml) 
were activated with LPS (11g ml) or anti-IgM and anti-CD40 and cultured with 
BAFF (10ng ml~!), TGFB (5ng ml~!), IL-4 (10ng ml), IL-5 (10ng ml~!), and 
all-trans retinoic acid (RA) (10nM) in IMDM medium supplemented with 10% 
EBS, penicillin/streptomycin, L-glutamine, and 8-mercaptoethanol. To analyse the 
partitioning of cell division, purified B cells were stained with CellTrace Violet 
(Thermo Fisher Scientific) according to manufacturer's instruction or CEDA-SE 
as described previously*’. Cells were cultured (4 days) at pg, of 21%, 5% or 1%, 
after which surface Ig was analysed by flow cytometry. In comparisons of all three 
oxygen tensions, experiments were performed by dividing one common pool of B 
cells and using two separate hypoxia chambers maintained at constant Po, using 
nitrogen. 

Measurements of RNA and proteins. RNA was isolated using TRIzol reagent 
(Invitrogen). After cDNA synthesis by reverse transcription, expression of genes 
was analysed in duplicate samples using SYBR green PCR master-mix (Qiagen) 
by quantitative reverse transcriptase PCR (qRT-PCR) Data are presented as values 
normalized to wild-type control, and averaged over PCR normalized to levels of 
internal control (actin). Primer pairs and cycler conditions are freely available on 
request. Proteins in whole-cell extracts were separated by SDS-PAGE, transferred 
onto nylon membranes (Millipore), and then incubated with rabbit antibodies 
against p-S6 (S235/236), p-p70S6K (S371), p-Akt (S473), p-Akt (T308), p-ACC 
(S79), p- AMPK (T172) (Cell Signaling Technologies), or goat anti-actin (Santa 
Cruz) antibodies followed by the appropriate fluorophore-conjugated, species- 
specific secondary anti-Ig antibodies (Rockland Immunochemicals, and LI-COR). 
Proteins were visualized and quantitated by laser excitation and infrared imaging 
(Odyssey, LI-COR). For measurements of the induction of S6K, S6 and Akt 
phosphorylation, purified B cells were cultured 2 days in BAFF (10 ng ml!) and 
F(ab’), anti-IgM (11g ml~!), washed, rested 18 h, and then re-stimulated (15 min) 
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in the presence or absence of F(ab’), anti-IgM (2.5 j1g ml~!). To test the effect of 
amino acid supply on S6K and S6 phosphorylation, B lymphoblasts were washed, 
cultured in complete medium overnight, then rinsed, cultured in amino acid-free 
RPMI1640 (US Biological) for 1h, and re-stimulated in the presence or absence 
of anti-IgM, with readdition of L-leucine (Sigma) or all 20 amino acids. For the 
induction of p-ACC and p-AMPK, purified B cells were cultured for 2 days in 
LPS, BAFF and IFNy. 

Glycolysis and oxygen consumption assays. Purified B cells were cultured for 
2 days at 37°C at Po, of 21% (normoxia) or Po, of 1% (hypoxia) in the presence 
of BAFF, LPS and IFN. To quantify glycolysis, 1 x 10° viable cells were washed, 
pulsed with 10\.Ci of 5-[PH] glucose in 24-well plates (37°C, 1h), and returned to 
their previous oxygen condition. Glycolytic conversion was then quantitated as 
described**. Oxygen consumption rates were measured using Seahorse assays. 
Because this instrument cannot be used in a hypoxia chamber, purified B cells 
(1 x 10° cell ml~') were activated with 1 Lg ml! LPS and cultured 48h with 
10ng ml! BAFF in complete IMDM medium supplemented as described* in the 
presence or absence of 0.5mM DMOG. After 48h, cultured B cells were washed 
twice, resuspended in XF Base Media (Seahorse Bioscience) supplemented with 
2mM t-glutamine, and equal numbers of Trypan Blue-excluding B cells (1.5 x 10°) 
were plated on extracellular flux assay plates (Seahorse Bioscience) coated with 
CellTak (Corning) according to the manufacturer's protocol. Before extracellular 
flux analysis, B cells were rested (25 min at 37 °C, atmospheric CO2) in XF Base 
Media. Oxygen consumption rate (OCR) and the extracellular acidification rate 
(ECAR) were measured using a XF96 extracellular flux analyser (Seahorse 
Bioscience) before and after the sequential addition of 10 mM p-glucose, 11M 
oligomycin, and 50 mM 2-deoxyglucose. 

Amino acid uptake assay. Purified B cells were activated and cultured for 2 days 
with LPS and BAFF. Viable cells were washed and incubated with amino acid 
uptake buffer (5.4mM KCl, 140mM NaCl, 1.8mM CaCl, 0.8mM MgSO, 5mM 
p-glucose, 25mM HEPES, and 25 mM Tris, pH 7.5) for 30 min to deplete intracellular 
amino acids. Triplicate samples (1 x 10° cells per sample) were incubated with 1 Ci 
of L-[3, 4, 5-*H]leucine (American Radiolabelled Chemicals, Inc.) in amino acid 
uptake buffer for 2 min at room temperature and immediately spun through a layer 
of bromododecane (20011) into 8% sucrose/ 20% perchloric acid (25 11). Tubes 
were frozen in a dry ice/ethanol bath and cut with dog nail clippers to separate 
the cells from unincorporated [*H]leucine. 25 11 of 10% Triton X-100 and liquid 


scintillation cocktail were added and the cell-associated 7H were measured by 
liquid scintillation counting. 

Statistical analysis. The primary analyses were conducted on pooled data points 
from independent samples and replicate experiments (minimum two, generally 
three, biologically and temporally independent replicate experiments for all data, 
with multiple independent samples in the case of two biological replicates), using 
an unpaired two-tailed Student's t-test with post-test validation of its suitability. 
Welch's or Mann-Whitney testing were used instead of the t-test where indicated 
based on statistical analysis of the distribution of variances in the samples to be 
compared. Data are displayed as mean + s.e.m., that is, ‘centre values’ were mean 
as ‘average. Results were considered statistically significant when the P value of 
for the null hypothesis of a comparison was <0.05. Since the extent or direction 
of difference between samples was unknown, and regulations mandate reducing 
the number of animals used to the lowest feasible level, no statistical methods 
were used to determine pre-specified sample sizes. The experiments were not 
randomized and the investigators were not blinded during the experiments. 
Corrections for multiple comparisons were not used. Statistical approaches for 
RNA-seq-related data are outlined in that section. 


29. Liu, Q., Davidoff, O., Niss, K. & Haase, V. H. Hypoxia-inducible factor regulates 
hepcidin via erythropoietin-induced erythropoiesis. J. Clin. Invest. 122, 
4635-4644 (2012). 

30. Guertin, D. A. et al. Ablation in mice of the mTORC components raptor, rictor, 
or mLST8 reveals that mMTORC2 is required for signaling to Akt-FOXO and 
PKCa, but not S6K1. Dev. Cel! 11, 859-871 (2006). 
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Extended Data Figure 1 | See next page for caption. 
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Extended Data Figure 1 | Landscape of hypoxic cells in follicles and 
GCs of lymphoid organs. a, b, Controls for anti- HIF-1« antibody staining 
of GCs and portions of the surrounding splenic follicle, as in Fig. 1a, b, 
with fluorescent signals at the same intensity settings when analysing 
samples processed together, using SRBC immunization of wild-type and 
Hif-deleted mice and either anti-HIF-1«a sera or non-immune rabbit IgG 
(rIgG1), as indicated. Shown are flow cytometry results of intracellular 
staining performed after exposure of lymphoblasts of the indicated 
genotypes to 4-hydroxytamoxifen and hypoxia (a), and confocal images 
(original magnification, x40) (b), as in Fig. 1a and b, respectively. 

c, Quantified data obtained from samples represented in Fig. 1c. Shown 
are the mean (+ s.e.m.) specific fluorescence intensities of hypoxyprobe 
(anti-pimonidazole) staining in GCs (delimited as GL7+) and GL7~ 
IgD* follicular B cell regions after subtracting background signal (mean 
fluorescence intensities in these regions after anti-pimonidazole staining 
of samples from PBS-injected control mice). d, Immunostaining of EF5- 
modified cells. Shown are confocal microscopic images of spleen sections 
from SRBC-immunized mice injected with EF5 (left) or PBS (right) 2h 
before collection, followed by direct immunofluorescent staining of 
frozen sections with anti-GL7, anti-IgD and anti-EF5 antibodies, 


representative of the quantified data presented in Fig. ld (n=7 GC 

from 3 mice in biological replicate analyses). e, Representative images 

of mesenteric lymph nodes after injections and immunostaining as in 

Fig. 1c. f, Low magnification (x 10; panels are 900 jum x 900 1m) image of 
anti-pimonidazole immunohistochemistry on spleen sections from SRBC- 
immunized mice injected with pimonidazole (left) or PBS (right) before 
collection. Among stained sections for both anti-pimonidazole and EF5, 
~75% of GC sections were unequivocally positive (n = 14 sections from 
4 spleens in biological replicate analyses). g, Representative images of 
Peyer's patches from non-immune, EF5-injected mice processed as in 
Fig. 1c (n=6 samples from 3 mice in biological replicate analyses). 

h, Representative images of spleen sections from unimmunized mice 
injected with hypoxyprobe (left) or PBS (right) 3h before collection, 
processed in parallel with sections from immunized mice injected with 
probe, and imaged by confocal microscopy at the same time and settings 
as for the sections from immunized mice (for each, n= 4 sections 

from 2 spleens in independent biological replicates). i, GSEA plots 
comparing gene set pre-ranked by logs-fold change in relative expression 
(GL7*/GL7_) in a hypoxia gene signature. 
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Extended Data Figure 2 | See next page for caption. 
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Extended Data Figure 2 | Altered B cell survival, proliferation and 
metabolism in reduced Po,. a, Increased executioner caspase-3 activation 
in hypoxic B cells. Left, representative flow histograms of cleaved 
caspase-3 (CC3) in the B cell gate after activated B cells were cultured in 
Po, of 21% (normoxia) and 1% (hypoxia). B cells were stimulated with 
BAFE, LPS and IFN4, cultured for 4 days at the indicated oxygen tension 
and processed for detection of activated caspase-3 using fluorescent- 
conjugated CC3 antibody. Right, quantitative data for the frequencies of 
B cells positive for caspase-3 cleavage in three independent replicate 
experiments (mean + s.e.m.). b, O2 sufficiency enhances cell cycle rates. 
As ina, but cells were pulsed with BrdU and frequencies of S-phase during 
the cultures are displayed in relation to IgG2c switching. Left, a 
representative result. Right, quantification of the overall B220* cell 
populations in three independent replicate experiments. B cells were 
cultured for 4 days with BAFF, LPS and IFN. at the indicated oxygen 
levels, pulsed for 4h with BrdU, and then stained with anti-IgG2c, -B220 
and -BrdU antibodies after fixation, permeabilization and processing. 

c, d, Pools of purified wild-type B cells were stimulated with BAFF and 
LPS, divided and cultured for 2 days in Po, of 21% (normoxia) and 

1% (hypoxia). c, Rates of glycolysis were measured after returning 

to their previous oxygen conditions, using equal numbers of surviving 


B cells after culture as detailed in the Methods. Glycolysis rates were 
measured in three independent experiments (mean +s.e.m.). d, Inhibition 
of PHD activity decreases cellular respiration of B lymphoblasts. Purified 
B cells were activated and cultured for 2 days with LPS and BAFF in the 
presence or absence of DMOG (0.5 mM). The oxygen consumption rate 
(OCR) was measured with cultured viable B cells (1.5 x 10° cells) 

(see Methods). The OCR was measured from technical triplicates in one 
experiment representative of three independent replicates with similar 
results (mean + s.d.). e, Metabolic gene expression profile of GL7* GCB 
cells. Genes showing significant expression changes in GL7* GCB cells 
were mined for genes important for the indicated cellular processes. 

The heat map depicts values for the indicated genes shown as the value 
derived as log) of the fragments per kilobase per million (reads) after 
adding 1 to each value (FPKM + 1). f, Hypoxia limits switch to IgG among 
B cells activated via BCRs and CD40. As in Fig. 2a, except that the B cell 
preparations were activated by cross-linking their surface IgM and CD40 
without addition of LPS. g, Quantified mean fluorescence intensities for 
GFP expression in the full set of replicate experiments conducted as in 
Fig. 2d, presented as mean (+ s.e.m.) data for each condition of culture 
(Po, of 21, 5 or 1%, with cytokines and retinoic acid for Ig class switch 
conditions as indicated, and as for Fig. 2a, b). 
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Extended Data Figure 3 | See next page for caption. 
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Extended Data Figure 3 | HIF stabilization alters B cell survival, 
proliferation and class switched antibody level. a, Purified wild-type 
B cells were activated and cultured for 4 days with LPS and BAFF in the 
presence or absence of DMOG, after which frequencies of cells with 
cleaved caspase 3 or BrdU uptake, as indicated, were measured as in 
Extended Data Fig. 2 (representative result from one experiment among 
n= 3 independent replicate experiments). b, Purified wild-type B cells 
were activated and cultured in conditions for switching to IgG1, IgG2c 
and IgA, as in Fig. 2a, b, but at atmospheric (21%) Po, in the presence or 
absence of DMOG. The frequencies of surface IgG1, IgG2c and IgA among 
B220*-gated cells were measured as in Fig. 2 (see Methods). FACS plots 


display the surface levels of IgG1, IgG2c and IgA on B220+-gated cells 

in one experiment representative of three independent replicates. 

c, HIF inhibition impedes the hypoxia-induced alteration of antibody class 
switch choices. B cells were activated and cultured for 4 days with BAFEF, 
LPS and the indicated switching conditions as in Fig. 2a (IL-4, IgG1; IFNy, 
IgG2c; retinoic acid, TGF@, IL-4 and -5, IgA) at Po, of 21% (normoxia) 

or 1% (hypoxia) in the presence or absence of the HIF inhibitor Bay 
87-2243. FACS plots displaying the surface levels of IgG1, IgG2c and IgA 
on B220*-gated cells in one representative result among three independent 
experiments are shown. Flow data shown in this figure were acquired 

ona BD FACScalibur but otherwise analysed as detailed in the Methods. 
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Extended Data Figure 4 | See next page for caption. 
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Extended Data Figure 4 | Hypoxia and PHD inhibition repress T-bet 
induction. a, b, B cells from wild-type mice were activated and cultured in 
LPS, BAFF and IL-4 or IFNy for 4 days under normoxic and hypoxic 
conditions (a) or cultured with and without DMOG at Po, of 21% (b). 
Shown are results of immunoblots using anti-T-bet antibody along with 
actin as a loading control. Shown is one representative result from three 
independent experiments. c, HIF-dependent regulation of T-bet 
expression by pVHL. B cells from wild-type or conditionally deleted 

vni4!4 and VhIA!A Hifa®! Epas1 /A (Vhl and V;H1;H2 cKO, respectively) 
mice were activated and cultured for 4 days in LPS and BAFF in the 
presence or absence of IFN7, as indicated. Results of one representative 
immunoblot (from three independent experiments) probed for HIF-1a, 
T-bet and actin are shown. d, HIF superinduction by pVHL depletion 


in B cells at 1% Po,. Wild-type and B cells after conditional VhI' deletion 
were activated, cultured in 1% Po, as in Extended Data Fig. 1a, and 
analysed by flow cytometry after processing together for indirect 
immunofluorescent staining of intracellular HIF-1a as in Fig. la and 
Extended Data Fig. 1a. Numbers denote the mean fluorescent intensity 
(MFI) of the B cells of each type. e, Flow cytometric data from one 
representative experiment as in Fig. 3e, in which B cells were transduced 
with MIT, MIG, MIT-T-bet or pMx-GFP-AID retrovectors, and cultured 
with BAFF and LPS + IFN7 in the presence or absence of DMOG. The 
frequencies of surface IgG2c* events among B220* cells analysed 4 days 
after transduction are shown, with flow data from one experiment of three 
independent experiments. 
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Extended Data Figure 5 | See next page for caption. 
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Extended Data Figure 5 | VHL regulates antigen-specific 

antibody production. a, Schematic outline of adoptive transfer 
experiments. B cells purified from tamoxifen-treated wild-type, 

VhE, or Vil! * Hifl q'! *Epasl ‘.ERT2_Cre mice were transferred into 
recipients after mixing with CD4* T cells (polyclonal:OVA-specific 

OT-II cells = 4:1). Recipients were analysed after primary immunization, 
or, for memory responses, after the primary and a recall immunization. 

b, As in Fig. 3a, except B cells from wild-type or conditionally deleted 

VhI knockout mice were mixed with CD4* OT-II TCR transgenic T cells, 
transferred into Ig Cy allotype-disparate (IgH*) recipient mice, followed 
by immunization with NP-OVA and collected 3 weeks after primary 
immunization. Donor- (b allotype) and recipient- (a allotype) derived NP- 
specific IgM and IgG] levels in the sera were analysed by ELISA. The mean 
(+ s.e.m.) absorbance data averaging independent samples (n = 8 WT and 
n=7 VhI cKO) obtained in two separate transfer experiments (measured 


on the same ELISA plate) are shown. c, d, As in Fig. 3a, wild-type 

or VAI! (VhI cKO) B cells were mixed with wild-type CD4* 

T cells (polyclonal:OVA-specific OT-II cells = 4:1), and transferred into 
Rag’ recipients that were then immunized with NP-OVA, and analysed 
for NP-specific antibody levels 3 weeks after primary immunization (c) 
or, for memory response, 9 weeks after the primary immunization and 

1 week after a recall immunization (d) (1 =5 independent recipients per 
genotype in two independent experiments) (c). Mean (+ s.e.m.) ELISA 
data for all-affinity IgM anti-NP from the same samples as Fig. 3b are 
shown. d, Impaired immune memory follows interference with the B cell 
hypoxia response system. Terminal sera obtained from the recipient mice 
(Fig. 3a) 1 week after recall immunization were analysed by ELISA for all- 
affinity anti-NP antibodies of the indicated isotypes at the same time as the 
primary response samples (as in ¢ and Fig. 3a). 
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Extended Data Figure 6 | See next page for caption. 
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Extended Data Figure 6 | HIF-dependent regulation of antigen-specific 
B cell population and antibody response by pVHL. a, b, As in Fig. 3, 
wild-type, VhI4/4 (VhI cKO), or VIA! Hifa*!* Epas1*/4 (V;H1;H2 
cKO) B cells were mixed with wild-type CD4* T cells (polyclonal:OVA- 
specific OT-II cells = 4:1), transferred into Rag’ recipients that were 

then immunized with NP-OVA and analysed for NP-specific antibody 
levels after primary immunization as in Fig. 3b, c. Using the same mice 
and samples as in Fig. 3b, c, cells in spleen secreting IgG2c anti- NP were 
quantified by ELISpot and averaged as frequencies of antibody-secreting 
cells (ASC) in the sample (a). Mean (+ s.e.m.) frequencies for all samples 
(n=9 each) are shown. b, Anti-NP IgA levels in the sera of the samples 
used in Fig. 3b were quantified by ELISA. c, d, VHL regulation of antigen- 
specific GCs and memory B cells is HIF-dependent. As in Fig. 3b, c, 


wild-type, VAl cKO or V;H1;H2 cKO B cells were mixed with CD4* T cells 
(polyclonal:OVA-specific OT-II cells = 4:1), transferred into Rag’ mice, 
immunized with NP-SRBC along with NP-OVA, boosted with NP-OVA 
at 3 weeks after primary immunization, and analysed at 1 week after the 
boost. Shown are the mean (+ s.e.m.) frequencies or numbers of antigen 
(NP)-binding B cells of GC (IgD~ GL7*) (c), and early memory (IgD~ 
GL7~ CD38") phenotypes (d) derived from each donor population and 
recovered in the recipient mice, as determined by enumeration and flow 
cytometric phenotyping with fluor-conjugated NP antibody. P values, as 
indicated in the figure, were derived using Welch’s test for comparisons 
in a, cand d, where the variances were unequal but followed a normal 
(Gaussian) distribution. 
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Extended Data Figure 7 | See next page for caption. 
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Extended Data Figure 7 | Hypoxia interrupts impairs an activation- 
induced feed-forward loop in which mTORC1 increases leucine uptake 
by B cells. a, PHD inhibition attenuates mTORC1 activity. Wild-type 

B cells were activated with anti-IgM and cultured for 2 days in BAFF, 
rested for 20 h in the presence or absence of DMOG, and then re- 
stimulated for 20 min with anti-IgM. Shown are immunoblots probed with 
anti-HIF-1a, anti-p-S6K, anti-p-S6 and anti-S6 antibody along with anti- 
actin as a loading control. Data are the results from one representative 
experiment among three independent replicates. b-f, Hypoxia and HIF 
stabilization reduce leucine uptake and mTORC1 activation. b, c, Reduced 
leucine uptake (b) and Slc7a5 mRNA encoding the large neutral amino 
acid transporter LAT1 (c) with inhibition of PHD proteins or mTOR. 
Wild-type cells were analysed after culture in 1% O; or at Po, of 21%, in 
presence of vehicle, DMOG or mTORC1 inhibitor (rapamycin) as 
indicated. b, B cell uptake of leucine, in n = 3 independent experiments. 


c, Relative mRNA level, normalized to actin (n =3 independent 
experiments). d, e, Activated B cells of the indicated genotypes were 
assayed for leucine uptake (d) and induction of the Slc7a5 gene encoding a 
large neutral amino acid transporter (e). d, Leucine uptake by the cultured 
cells, normalized in each independent experiment (n = 3) to activated 
wild-type cells. e, VHL loss leads to HIF-dependent attenuation of Slc7a5 
mRNA levels. Wild-type or conditional knockout B cells of the indicated 
genotypes were activated and cultured at 21% Oy} as in Fig. 3d. qPCR 
results normalized first to actin for level within a sample, and then to the 
wild-type control in each independent experiment (n = 3). f, Leucine 
stimulates mTORC1 activity in activated B cells. Activated wild-type 

B cells, divided and cultured overnight in medium lacking or sufficient 
for the indicated amino acid, were restimulated and analysed as in 

Fig. 4a, b. Data are mean +s.e.m. 
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Extended Data Figure 8 | Hypoxia promotes AMPK activity and 
induction of the mTORC1 inhibitor REDD1 without repressing 
mTORC2. a, B cells were activated and grown for 2 days in LPS and BAFF 
at the indicated Po, and in the presence or absence of IFN1 as indicated. 
ATP concentrations in equal numbers of cells were then assayed. In each of 
three replicate experiments with similar results, the [ATP] measured for 
cells at conventional (21%) Po, without IFNy was set as 1, and the mean 
(+ s.e.m.) levels in each sample relative to this reference are shown for 
three biological replicates. b, Immunoblot results after probing membranes 
with anti-p-ACC, anti-p-AMPK (T172) and actin are shown for one 
representative experiment. Numbers indicate the level of signal for cells 
cultured in hypoxia or DMOG as compared to the reference value of the 
sample cultured in conventional (21%) Po,, after normalization of each 
sample according to its loading. c, Results of a representative RT-PCR 
experiment measuring Redd1 mRNA in wild-type B cells (activated and 
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cultured as in b), with each sample first normalized to Actb mRNA and 
then to vehicle-treated cells. d, e, Effect of VHL, hypoxia and DMOG on 
Akt phosphorylation in B cells. d, B cells were activated with anti-IgM and 
BAFF, cultured for 2 days and rested for 20h under conditions of hypoxia 
or normoxia in the presence or absence of DMOG, after which cells were 
re-stimulated (20 min) with anti-IgM. e, As in d, B cells from wild-type or 
conditionally deleted Vhl knockout mice were activated with anti-IgM in 
the presence of BAFF, cultured for 2 days and rested for 20h, after which 
cells were re-stimulated (20 min) with anti-IgM. Shown are results of 
immunoblots probed with antibodies directed against p-Akt (T308), 
p-Akt (S473), and Akt. Numbers show the quantification of signal relative 
to B cells that were not restimulated, after adjustment of each sample for 
loading as determined by total Akt. Data shown are from one 
representative experiment among three independent replicates. 
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Extended Data Figure 9 | See next page for caption. 
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Extended Data Figure 9 | mTORCI1 regulates expansion of antigen- 
specific B cells and antibody class spectrum. a, Results of immunoblots 
using anti-raptor and anti-p-S6 antibodies, with anti-S6 antibody as a 
loading control. B cells (wild-type or haploinsufficient for raptor) were 
activated with F(ab’), anti-IgM and BAFF, cultured for 2 days and rested 
for 20h, after which cells were re-stimulated for 20 min with F(ab’). 
anti-IgM. Data shown are from one representative experiment among 
three independent replicates. b, Recipient antibody controls for effect 

of mTORCI on class-switched antibody responses. As in Fig. 4c, wild- 
type or raptor-haploinsufficent B cells (from heterozygous mice that 
were Rosa26;ER™-Cre, Rptor'/* and converted to Rptor*/* by tamoxifen 
injections) were mixed with CD4* OT-II TCR transgenic T cells, 
transferred into Ig Cy allotype-disparate recipient mice, immunized with 
NP-OVA, and obtained 3 weeks after primary immunization. Donor- 
derived (b allotype) (in Fig. 4) or recipient-derived (a allotype) NP-specific 


LETTER 


IgG1 and IgG2c levels in the sera were analysed by ELISA. Absorbance 
data averaging samples (n =9 WT versus n =8 Rptor*'4) obtained in three 
separate experiments (measured on the same ELISA plate). c—e, Wild- 
type or Rptor*/4 B cells were mixed with CD4* T cells (polyclonal:OVA- 
specific OT-II = 4:1) and transferred into Rag’ mice and immunized with 
NP-OVA. Shown are the recoveries of antigen (NP)-binding wild-type 
versus Rptor*/* B cells of GC (B220* GL7* IgD~) (c) and early memory 
(B220* CD38t GL7~ IgD~) (d) phenotypes. e, Generation of antigen- 
specific IgG2c-secreting cells depends on mTORCI. Mean (+ s.e.m.) 
results of ELISpot assays quantitating NP-binding IgG2c (b allotype) 
antibody-secreting cells from the experiments in b and Fig. 4c, d, 
quantified as described in Extended Data Fig. 6a. P values were derived 
using Welch’s test for comparisons in c-e, in which the variances were 
unequal but followed a normal (Gaussian) distribution. 
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Extended Data Figure 10 | See next page for caption. 
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Extended Data Figure 10 | mTORC1 is rate-limiting for AID expression 
and switching to IgG2c. a, A division-independent mechanism dependent 
on mTORC1 quantity in B cell switching to IgG2c. Flow cytometric data in 
the B cell gate, displaying carboxyfluorescein diacetate succinimidyl ester 
(CFDA-SE) partitioning (fluorescein emission intensities) versus IgG2c, 
were from one experiment representative of three independent biological 
replicates. Wild-type or Rptor*/* B cells were stained with CFSE and 
cultured with LPS, BAFF and IFN, and analysed by flow cytometry. 

b, Wild-type or Rptor*’* B cells were cultured for 2 days with LPS, BAFF 
and IFNy. mRNA levels of the Aidca (left) and Tbx21 (right) genes 
measured in three independent replicate experiments by (RT-PCR 
normalized to actin in the sample and then to the level in wild-type 

cells (set as relative level of 1). c, immunoblots probed for raptor, T-bet 
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and actin, as indicated, using B cells as in b (representative of n = 3 
independent experiments). d, mTOR promotes switching to IgG by 
division-independent mechanisms. As in a, but CFSE-stained wild-type 
B cells were activated and cultured for 4 days with LPS, BAFF and IFNy 
in the presence or absence of rapamycin versus vehicle. e, f, mTORC1 
regulation of AID level in collaboration with T-bet determines efficient 
switching to IgG2c. B cells were transduced with MIT, MIG, MIT-T-bet 
or pMx-GFP-AID retrovectors, and cultured with BAFF and LPS and/or 
IFN7 in the presence or absence of rapamycin (5 nM). e, Representative 
flow data, from one experiment among three independent replicates, 
derived as in Extended Data Fig. 4e. f, Frequencies of surface IgG2c* 
events among B220* cells analysed 4 days after transduction are shown 
(n=3 independent experiments). Data are mean + s.e.m. 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


doi:10.1038/nature19346 


The long non-coding RNA Morrbid regulates Bim 
and short-lived myeloid cell lifespan 


Jonathan J. Kotzin?*, Sean P. Spencer!?*+, Sam J. McCright!*, Dinesh B. Uthaya Kumar**, Magalie A. Collet?, 

Walter K. Mowel!”, Ellen N. Elliott?, Asli Uyar?, Michelle A. Makiya®, Margaret C. Dunagin®, Christian C. D. Harman’®, 
Anthony T. Virtue!, Stella Zhu?, Will Bailis’, Judith Stein?®, Cynthia Hughes”®, Arjun Raj®, E. John Wherry”, Loyal A. Goff!?, 
Amy D. Klion®, John L. Rinn!*’, Adam Williams*, Richard A. Flavell”’® & Jorge Henao-Mejia!?4 


Neutrophils, eosinophils and ‘classical’ monocytes collectively 
account for about 70% of human blood leukocytes and are among 
the shortest-lived cells in the body’. Precise regulation of the 
lifespan of these myeloid cells is critical to maintain protective 
immune responses and minimize the deleterious consequences 
of prolonged inflammation!”. However, how the lifespan of these 
cells is strictly controlled remains largely unknown. Here we 
identify a long non-coding RNA that we termed Morrbid, which 
tightly controls the survival of neutrophils, eosinophils and classical 
monocytes in response to pro-survival cytokines in mice. To control 
the lifespan of these cells, Morrbid regulates the transcription of 
the neighbouring pro-apoptotic gene, Bcl2I11 (also known as Bim), 
by promoting the enrichment of the PRC2 complex at the Bcl2I11 
promoter to maintain this gene in a poised state. Notably, Morrbid 
regulates this process in cis, enabling allele-specific control of 
Bcl2I111 transcription. Thus, in these highly inflammatory cells, 
changes in Morrbid levels provide a locus-specific regulatory 
mechanism that allows rapid control of apoptosis in response to 
extracellular pro-survival signals. As MORRBID is present in 
humans and dysregulated in individuals with hypereosinophilic 
syndrome, this long non-coding RNA may represent a potential 
therapeutic target for inflammatory disorders characterized by 
aberrant short-lived myeloid cell lifespan. 

Neutrophils, eosinophils and ‘classical monocytes represent a first 
line of defense against nearly all pathogens”. However, these short- 
lived myeloid cells also contribute to the development of several 
inflammatory diseases’. Cytokines and metabolites tightly regulate 
the function and lifespan of these cells, but how these cues are trans- 
lated into an optimal cellular lifespan is largely unknown. Emerging 
evidence indicates that certain long non-coding RNAs (IncRNAs) can 
integrate extracellular inputs with chromatin-modification pathways 
allowing cells to rapidly adapt to their environment*". As such, we 
investigated whether IncRNAs control the function or lifespan of short- 
lived myeloid cells in response to extracellular cues. We first analysed 
multiple RNA sequencing (RNA-seq) datasets for mouse IncRNAs that 
are preferentially expressed by mature short-lived myeloid cells®®. We 
identified an uncharacterized IncRNA (Gm14005) that we termed 
Morrbid (myeloid RNA regulator of Bim-induced death). Morrbid is 
conserved across species, contains five exons, is poly-adenylated and is 
localized predominately to the nucleus bound to chromatin (Fig. 1a, b, 
Extended Data Fig. la-d). Importantly, Morrbid is highly and specifically 


expressed by mature eosinophils, neutrophils and classical monocytes 
in both mice and humans (Fig. 1c, d, Extended Data Fig. le, f). 

To investigate the role of Morrbid in vivo, we deleted the Morrbid 
locus to generate Morrbid-deficient mice (Extended Data Fig. 1g). 
Notably, and in accordance with the expression profile of Morrbid, we 
found that eosinophils, neutrophils and Ly6C" classical monocytes 
were markedly reduced in the blood and tissues of these mice (Fig. le, 
Extended Data Fig. 1h, i). This defect was highly specific to these three 
cell types, as well as blood Ly6C!® monocytes (Extended Data Fig. 2a), 
which are suggested to be progeny of Ly6C™ monocytes’. All other 
lymphoid and myeloid cell types were unaffected (Extended Data 
Fig. li, 2a). Similarly, knockdown of Morrbid in vivo also led to a specific 
reduction in the frequency of short-lived myeloid cells in blood and 
spleen (Extended Data Fig. 2b-e). Finally, as these cells have a critical role 
in protective immunity and in the development of immunopathology, 
we found that Morrbid-deficient mice were highly susceptible to bac- 
terial (Listeria monocytogenes) infection (Fig. 1f, g), and protected 
from eosinophil-driven allergic lung inflammation (Extended Data 
Fig. 2f-h). Altogether, these results support an important and selective 
role for Morrbid and potentially DNA elements within its locus in 
short-lived myeloid cell homeostasis. 

Eosinophils, neutrophils and Ly6C™ monocytes originate from 
common progenitors in the bone marrow (BM), with extracellular 
cues driving the developmental programs needed to produce each of 
these cell types’*. Using mixed BM chimaeras, we found that Morrbid- 
deficient BM cells have a significant defect in the generation of short- 
lived myeloid cells (Extended Data Fig. 3a—e), indicating that Morrbid 
acts in a cell-intrinsic manner. We next sought to determine whether 
Morrbid regulates short-lived myeloid cell development. Early pro- 
genitors of each of these cell types express low levels of Morrbid and 
its expression increases throughout development to reach maximal 
levels in fully mature eosinophils, neutrophils and Ly6C™ monocytes 
(Extended Data Fig. 3f-h). In accordance with this pattern of expres- 
sion, the progenitors of each of these cell types were intact in Morrbid- 
deficient mice (Fig. 2a, Extended Data Fig. 3g-h). These results suggest 
that Morrbid regulates the frequency of mature eosinophils, neutrophils 
and monocytes, but not their progenitors. 

Mature populations of myeloid cells are controlled by several mech- 
anisms, including homeostatic proliferation, trafficking, and cell death. 
We found no defects in homeostatic proliferation in Morrbid-deficient 
mice (Extended Data Fig. 4a). Mature short-lived myeloid cells are 
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Figure 1 | ncRNA Morrbid is a critical regulator of eosinophils, 
neutrophils and Ly6C" monocytes. a, Human neutrophil and mouse 
granulocyte normalized RNA-seq and ChIP-seq tracks at the Morrbid 
locus. b, Single molecule Morrbid RNA fluorescence in situ hybridization 
(FISH). c, d, qPCR expression of mouse ( = 3; representative of 

3 independent experiments) (c) and human Morrbid in indicated cell types 
and tissues (n = 7) (d). e, Wild-type and Morrbid-deficient flow cytometry 


104 T T 


plots and absolute counts (n = 3-5; representative of 7 independent 
experiments). f, g, L. monocytogenes infection of wild-type and Morrbid- 
deficient mice. f, Survival and weight loss (n= 9, representative of 

3 independent experiments). g, Colony-forming units (CFUs) per g from 
indicated organs (n=5; representative of 3 independent experiments). 


Error bars show s.e.m. “P< 0.05, *"P<0.01, and ““*P < 0.001 (two-sided 
t-test, e, g, f (right); Mantel-Cox test, f (left)). 
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Figure 2 | Morrbid controls eosinophil, neutrophil and Ly6C* 
monocyte lifespan. a, Schematic of short-lived myeloid cell development 
and absolute numbers of the indicated cell types in BM from wild-type 
and Morrbid-deficient mice (n = 3-5; representative of 3 independent 
experiments). b, Frequency of Caspt (Z-VAD-FMK*) cultured BM 

cells (n =3 mice; representative of 2 independent experiments). 

c, Half-life of BrdU pulse-labelled neutrophils in blood in vivo (n=4 
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mice; representative of 3 independent experiments). d, Bcl2111 qPCR 
expression in indicated cell types sorted from BM (n = 3; representative of 
2 independent experiments). e, BCL2L11 protein expression assessed by 
flow cytometry in indicated BM cell types. Left, representative histograms. 
Right, mean fluorescence intensity (MFI) quantification (n = 3-5 mice, 
representative of 3 independent experiments). Error bars show s.e.m. 

“P <0.05, ““P<0.01 and ““*P < 0.001 (two-sided t-test). 
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substantially reduced in the BM of Morrbid-deficient mice and there 
was a near absence of in vitro BM-differentiated eosinophils? (Fig. 2a, 
Extended Data Fig. 4b, c), suggesting that Morrbid controls a domi- 
nant process independent of cell trafficking. Notably, Morrbid-deficient 
eosinophils, neutrophils and Ly6C™ monocytes were all highly prone 
to apoptosis in BM cultured ex vivo (Fig. 2b, Extended Data Fig. 4d). 
Furthermore, we observed significantly increased apoptosis in vitro in 
BM-derived eosinophils (Extended Data Fig. 4e), and in vivo during 
L. monocytogenes infection in the absence of Morrbid (Extended Data 
Fig. 4f). Given the close relationship between apoptosis and cellular 
lifespan, we hypothesized that Morrbid is a regulator of short-lived 
myeloid cell half-life. Using BrdU to label circulating neutrophils and 
determine their decay rate, we observed an ~2-fold decrease in the 
half-life of these cells (Fig. 2c, Extended Data Fig. 4g). These results 
indicate that Morrbid regulates short-lived myeloid cell lifespan 
through control of apoptosis. 

Some IncRNAs regulate the expression of neighbouring genes 
The pro-apoptotic gene Bcl2I11 (Bim) is located ~150-kb downstream 
of Morrbid (Extended Data Fig. 1a). Bcl2/11 has been shown to be an 
important regulator of myeloid homeostasis'*'°. Thus, we reasoned 
that Morrbid regulates short-lived myeloid cell lifespan through its 
control of Bcl2I11 expression. Indeed, the protein and mRNA levels of 
Bcl2111 were markedly elevated in eosinophils, neutrophils and Ly6C™ 
monocytes from Morrbid-deficient mice (Fig. 2d, e, Extended Data 
Fig. 4h-k). In concordance with the pattern of Morrbid expression, 
Bcl2111 was maximally elevated in the mature state of each of these cell 
lineages in Morrbid-deficient mice (Extended Data Fig. 41), and was not 
dysregulated in other myeloid and lymphoid cell populations (Extended 
Data Fig. 4m). Importantly, key myeloid lineage transcription factors 
and other genes neighbouring Morrbid were largely unaffected in the 
absence of Morrbid (Extended Data Fig. 5a—c). These results suggest 
that Morrbid represses Bcl2I11 expression in short-lived myeloid cells. 

To specifically address the role of Morrbid RNA in the regulation of 
Bcl2I11 expression, we first established an in vitro eosinophil culture 
system in which we could study the function of Morrbid RNA in the 
absence of genetic disruptions? (Extended Data Fig. 6a). Using this 
system, we found that short hairpin RNA (shRNA)-mediated knock- 
down of Morrbid RNA results in a significant elevation in BCL2L11, 
which was accompanied by a substantial decrease in eosinophil survival 
(Figure 3a—c, Extended Data Fig. 6b-d). We observed similar results 
using transfection of locked nucleic acids (LNAs) as an independent 
knockdown technique (Extended Data Fig. 6e). We next sought to cor- 
roborate these results in a different cell type within the myeloid cell lin- 
eage. Interestingly, we found that lipopolysaccharide (LPS)-stimulated 
BM-derived macrophages (BMDMs) highly upregulated Morrbid 
(Extended Data Fig. 6f). Notably, LNA knockdown of Morrbid, dele- 
tion of the Morrbid promoter, or deletion of its locus in LPS-stimulated 
BMDMs resulted in a marked increase in Bcl2/11 expression and apop- 
tosis (Extended Data Fig. 6f-1). Altogether, these results indicate that 
Morrbid RNA is a critical regulator of Bc/2/11 expression and short-lived 
myeloid cell survival. 

Pro-survival cytokines can potently influence the lifespan of immune 
cells. One well-described mechanism of this control is through the 
repression of Bcl2I111 (refs 15, 16). We hypothesized that cytokines 
from the common {-chain receptor family (IL-3, IL-5 and GM-CSF), 
which are known to promote the survival of eosinophils, neutrophils 
and Ly6ch monocytes, regulate Bcl2/11 expression through the induc- 
tion of Morrbid. To test this hypothesis, we first withdrew cytokines 
from cultured BM-derived eosinophils and observed a loss of Morrbid 
expression and an increase in Bc2I11 levels (Fig. 3d). Subsequent addi- 
tion of IL-5, IL-3 or GM-CSF induced Morrbid expression, which was 
accompanied by Bcl2I11 repression (Fig. 3d). Similarly, ex-vivo B-chain 
cytokine stimulation, but not G-CSF stimulation, significantly induced 
Morrbid and a corresponding repression of Bc/2/11 in neutrophils and 
Ly6C monocytes (Fig. 3e, Extended Data Fig. 6m). Importantly, 
Morrbid-deficient neutrophils were unable to inhibit Bcl2/11 expression 
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Figure 3 | Pro-survival cytokines repress pro-apoptotic Bcl2I11 
through induction of Morrbid RNA. a-c, BM-derived eosinophils 
transduced with control or Morrbid-specific shRNAs. a, Morrbid qPCR 
expression. b, c, BCL2L11 protein expression (b) and absolute eosinophil 
counts (c) (n =3 mice per group, representative of 2 independent 
experiments). d, Morrbid and Bcl2111 qPCR expression in BM-derived 
eosinophils following withdrawal/stimulation with indicated cytokines 
(n=3 mice, representative of 2 independent experiments). e, f, Morrbid 
and Bcl2111 qPCR expression in wild-type (e) and Morrbid-deficient 
(f) sorted BM cell types stimulated with indicated cytokines (n = 3-4 
mice, representative of 3 independent experiments). g-i, MORRBID 
expression in human hypereosinophilic syndrome (HES). Absolute 
eosinophil count (g), purified eosinophil MORRBID qPCR expression (h), 
and correlation between log(plasma IL-5) and MORRBID expression (i) 
(each dot represents one individual; n = 2-12 per disease group). FHES, 
familial HES; PDGFRA, PDGFRA* HES; EAE, episodic angioedema 
and eosinophilia; LHES, lymphocytic variant HES; HEus, HES of 
undetermined significance; Para, parasitic infection. Error bars show s.e.m. 
“P< 0.05, “P<0.01, and ““"P <0.001 (two-sided t-test, a—h; Spearman's 
correlation, i). 


upon addition of 6-chain cytokines (Fig. 3f). These results suggest that 
B-chain cytokines repress Bc/2I11 expression in short-lived myeloid 
cells in a Morrbid-dependent manner. 

Dysregulated immune cell survival is central to many human 
haematological and inflammatory diseases. Hypereosinophilic 
syndrome (HES) is a group of disorders characterized by eosinophilia 
and a wide range of clinical manifestations'’. Several HES subtypes 
have been associated with increased production or responsiveness to 
IL-5 (ref. 17). We therefore reasoned that eosinophils from individuals 
with HES would overexpress MORRBID, and that this overexpression 
would positively correlate with IL-5 levels. We screened patients with 
varied subtypes of HES (Fig. 3g), and found that eosinophils from these 
patients expressed significantly higher levels of MORRBID than that of 
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healthy controls (Fig. 3h). Additionally, we observed that MORRBID 
expression in eosinophils was positively correlated with plasma IL-5 
levels (Fig. 3i). These results suggest a potential role for MORRBID in 
HES and other inflammatory diseases characterized by high levels of 8 
-chain cytokines and altered short-lived myeloid cell lifespan. 

Genes that require both tight regulation and the ability to be rapidly 
activated frequently have activating (H3K4me3) and repressive 
(H3K27me3) histone marks in their promoters, termed bivalent 
promoters!®, The Bcl2/11 gene has previously been described as having 
a bivalent promoter, which allows this pro-apoptotic gene to be main- 
tained in a poised state'®. A number of IncRNAs have been shown to 
repress gene expression by promoting the enrichment of polycomb 
repressive complex 2 (PRC2) at target genes, which in turn catalyzes the 
deposition of H3K27me3 (refs 20, 21). We therefore hypothesized that 
Morrbid represses Bcl2111 expression and prevents short-lived myeloid 
cell apoptosis by promoting PRC2 enrichment and H3K27me3 
deposition at the bivalent promoter of Bcl2/11. 

To test this hypothesis, we first performed chromatin immunoprecip- 
itation followed by quantitative PCR (ChIP-qPCR) for total polymerase 
II (Pol 11), H3K27me3 and the PRC2 subunit EZH2 in neutrophils from 
wild-type and Morrbid-deficient mice. In line with the elevated levels 
of Bcl2I11 in Morrbid-deficient cells, we found that Pol II occupancy 
was significantly increased (Extended Data Fig. 7a), and the levels 
of H3K27me3 and EZH2 were markedly reduced at the promoter of 
Bcl2I11 in the absence of Morrbid (Fig. 4a, b). We next asked whether 
the induction of Morrbid expression promotes the accumulation of 
PRC2 at the Bcl2/11 promoter. Using the BMDM system in which 
Morrbid is induced upon LPS stimulation, we found that Morrbid levels 
and EZH2 occupancy at the Bcl2/11 promoter concurrently increase in 
a Morrbid-dependent manner (Extended Data Fig. 7b). Finally, using 
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ChIP-seq and ATAC-seq (assay for transposase-accessible chromatin 
using sequencing), we did not detect changes in the activating his- 
tone marks H3K4mel and H3K4me3, and only a modest increase 
in chromatin accessibility at the Bcl2/11 promoter in the absence of 
Morrbid (Extended Data Fig. 7c-f). Altogether, these results indicate 
that Morrbid represses Bcl2I11 expression in short-lived myeloid cells 
by promoting the deposition of H3K27me3 at the bivalent promoter 
of Bcl2I11. 

IncRNAs have been suggested to promote the recruitment of PRC2 
to target genes through direct IncRNA-PRC2 interactions or indirect 
mechanisms’!°4. To understand further the mechanism by which 
Morrbid promotes PRC2 enrichment at the Bcl2/11 promoter, we 
first examined whether Morrbid RNA associates with PRC2. Using a 
recently generated EZH2 photoactivatable ribonucleoside-enhanced 
crosslinking and immunoprecipitation (PAR-CLIP) dataset’, we 
found that Morrbid associates with EZH2 (Extended Data Fig. 8a). To 
further support this association, we performed RNA immunoprecip- 
itation against EZH2 in myeloid cells and found that Morrbid signifi- 
cantly co-immunoprecipitates with this PRC2 subunit (Extended Data 
Fig. 8b). We next asked whether Morrbid RNA associates with chro- 
matin regions within the Bc/2/11 promoter, with which PRC2 also 
associates. We performed chromatin isolation by RNA purification 
(ChIRP)-qPCR in LPS-treated BMDMs. Using DNA probes that 
specifically and robustly retrieved Morrbid RNA (Extended Data 
Fig. 8c), we found that Morrbid association with chromatin was 
significantly enriched at the Bcl2111 promoter (Fig. 4c). Finally, we 
asked how Morrbid RNA comes into proximity of the Bcl2I11 promoter. 
A number of IncRNA genes have been reported to loop into prox- 
imity with the genes that they regulate!”"!*”°; thus, we reasoned that 
the Morrbid and Bcl2111 loci interact with one another through DNA 
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looping. Using chromosome conformation capture (3C), we indeed 
observed a long-distance association between Bcl2/11 and the Morrbid 
locus in short-lived myeloid cells (Fig. 4d, Extended Data Fig. 8d). 
Altogether, these results suggest a model in which Morrbid proximity 
to Bcl2111, mediated through DNA looping, enables Morrbid RNA to 
promote PRC2 enrichment within the Bcl2/11 promoter through direct 
Morrbid-PRC2 interactions and potentially through additional indirect 
mechanisms. 

Our findings suggest an important role for PRC2 in Morrbid- 
dependent repression of Bcl2/11. Yet, whether short-lived myeloid 
cell survival depends on PRC2-mediated transcriptional repression 
of Bcl2111 is not known. We cultured eosinophils in the presence of a 
specific inhibitor of EZH2, GSK126. We observed a dose-dependent 
increase in BCL2L11 and eosinophil apoptosis upon PRC2 inhibition 
(Fig. 4e, f, Extended Data Fig. 8e, f). Importantly, Bcl2/11-deficient 
eosinophils were resistant to cell death following abrogation of PRC2 
activity (Fig. 4f, Extended Data Fig. 8e, f). Altogether, these results 
demonstrate that PRC2 regulates short-lived myeloid cell survival 
specifically through repression of Bc/2I11 expression, further supporting 
a critical role for Morrbid in the regulation of the lifespan of these 
highly inflammatory cells. 

Finally, we found that Morrbid-heterozygous mice largely recapitu- 
lated the phenotype of mice lacking both alleles of Morrbid (Extended 
Data Fig. 8g). In light of this dominant heterozygous phenotype and 
the observed Morrbid-Bcl2111 DNA loop, we hypothesized that Morrbid 
functions in cis to repress Bcl2111. As such, we expected that deletion 
of Bcl2I11 on the same chromosome as that of the Morrbid-deficient 
allele will normalize Bcl2/11 expression in short-lived myeloid cells 
and rescue their numbers, but that deletion of Bcl2111 on the opposite 
chromosome would not (Fig. 4g). We therefore generated all permuta- 
tions of Morrbid and Bcl2111 double-heterozygous mice (Extended Data 
Fig. 9). Notably, deletion of Bcl2/11 in cis, but not in trans, of the 
Morrbid-deficient allele normalized Bcl2I11 expression (Fig. 4h) and 
rescued short-lived myeloid cell numbers (Fig. 4i, j, Extended Data 
Fig. 10a, b). Other cell types were largely unaltered in these genetic 
backgrounds (Extended Data Fig. 10b-d). This complete rescue in 
cis double-heterozygous mice indicates that Morrbid acts in an 
allele-specific manner to regulate Bcl2/11 expression and short-lived 
myeloid cell lifespan. 

Here we show that Morrbid integrates extracellular signals to con- 
trol the lifespan of eosinophils, neutrophils and monocytes through 
allele-specific suppression of Bcl2/11 expression (Extended Data 
Fig. 10e). As this ncRNA is present in humans and dysregulated in 
patients with HES, a better understanding of how Morrbid RNA and 
potentially DNA elements within its locus regulate Bcl2I1 1 may provide 
new therapeutic approaches for several human inflammatory diseases. 
Finally, our results demonstrate that IncRNAs can function as highly cell- 
type specific local effectors of extracellular cues to control immunolog- 
ical processes that require rapid and strict regulation. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Mice. All mice were bred and maintained under pathogen-free conditions at an 
American Association for the Accreditation of Laboratory Animal Care accredited 
animal facility at the University of Pennsylvania or Yale University. Mice were 
housed in accordance with the procedures outlined in the Guide for the Care 
and Use of Laboratory Animals under an animal study proposal approved by an 
institutional Animal Care and Use Committee. Male and female mice between 
4 and 12 weeks of age were used for all experiments. Littermate controls were used 
whenever possible. 

C57BL/6 (wild type) and B6.SJL-Ptprc* Pepc/ Boy (B6.SJL) mice were pur- 
chased from The Jackson Laboratory. We generated Morrbid-deficient mice and the 
in cis and in trans double heterozygous mice (Morrbid*!~, Bcl2111*/~) mice using 
the CRISPR/Cas9 system as previously described”®. In brief, to generate Morrbid- 
deficient mice, single guide RNAs (sgRNAs) were designed against regions flanking 
the first and last exon of the Morrbid locus (Extended Data Fig. 1g). Cas9-mediated 
double-stranded DNA breaks resolved by non-homologous end joining (NHEJ) 
ablated the intervening sequences containing Morrbid in C57BL/6N one-cell 
embryos. The resulting founder mice were Morrbid~'*, which were then bred 
to wild-type C57BL/6N and then intercrossed to obtain homozygous Morrbid-/- 
mice. One Morrbid-deficient line was generated. To control for potential off-target 
effects, mice were crossed for at least 5 generations to wild-type mice and then 
intercrossed to obtain homozygosity. Littermate controls were used when possible 
throughout all experiments. 

To generate the in cis and in trans double heterozygous mice (Morrbid*'-, 
Bcl2111*'~) mice, we first obtained mouse one-cell embryos from a mating between 
Morrbid~'~ female mice and wild-type male mice. As such, the resulting one-cell 
embryos were heterozygous for Morrbid (Morrbid*'~). We then micro-injected 
sgRNAs designed against intronic sequences flanking the second exon of Bcl2I11, 
which contains the translational start site/codon, into Morrbid~'* one-cell embryos 
(Extended Data Fig. 9). Cas9-mediated double-stranded DNA breaks resolved by 
NHE)J ablated the intervening sequences containing the second exon of Bcl2I11 
in Morrbid*'~ (C57BL/6N) one-cell embryos, generating founder mice that were 
heterozygous for both Bcl2/11 and Morrbid (Bcl2111*!~; Morrbid~'+). Founder 
heterozygous mice were then bred to wild-type C57BL/6N to interrogate for the 
segregation of the Morrbid-deficient and Bcl2111-defient alleles (Extended Data 
Fig. 9). Pups that segregated such alleles were named in trans and pups that did not 
segregate were labelled in cis. One line of in cis and in trans double heterozygous mice 
(Bcl2I11*'~; Morrbid~'*) lines were generated. To control for potential off-target 
effects, mice were crossed for at least 5 generations to wild-type (C57BL/6N) mice 
(for in cis) and to Morrbid~'~ mice (for in trans) to maintain heterozygosity. To 
determine genetic rescue, samples from mice containing different permutations 
of Morrbid and Bcl2I11 alleles (Fig. 4g—j) were analysed in a blinded manner by a 
single investigator not involved in the breeding or coding of these samples. 

Flow cytometry staining, analysis and cell sorting. Cells were isolated from the 
indicated tissues (blood, spleen, bone marrow, peritoneal exudate, adipose tissue). 
Red blood cells were lysed with ACK. Single-cell suspensions were stained with 
CD16/32 and with indicated fluorochrome-conjugated antibodies. If run live, cells 
were stained with 7-AAD (7-amino-actinomycin D) to exclude non-viable cells. 
Otherwise, before fixation, Live/Dead Fixable Violet Cell Stain Kit (Invitrogen) 
was used to exclude non-viable cells. Active caspase staining using Z-VAD-FMK 
(CaspGLOW, eBiosciences) was performed according to the manufacturer's speci- 
fications. Apoptosis staining by annexin V+ (Annexin V Apoptosis Detection kit) 
was performed according to the manufacturer’s recommendations. BrdU staining 
was performed using BrdU Staining Kit (eBioscience) according to the manufac- 
turer’s recommendations. For BCL2L11 staining, cells were fixed for 15 min in 
2% formaldehyde solution, and permeabilized with flow cytometry buffer sup- 
plemented with 0.1% Triton X-100. All flow cytometry analysis and cell-sorting 
procedures were done at the University of Pennsylvania Flow Cytometry and Cell 
Sorting Facility using BD LSRII cell analysers and a BD FACSAria II sorter run- 
ning FACSDiva software (BD Biosciences). FlowJo software (version 10 TreeStar) 
was used for data analysis and graphic rendering. All fluorochrome-conjugated 
antibodies used are listed in Supplementary Table 2. 

Western blotting. 1 x 10° wild-type and Morrbid-deficient neutrophils sorted from 
mouse bone marrow were assayed for BCL2L11 protein expression by western 
blotting (Bim C34C5 rabbit monoclonal antibody, Cell Signaling), as previously 
described. 

ChIP-qPCR. 2 x 10° wild-type and Morrbid-deficient neutrophils sorted from 
mouse bone marrow were cross-linked in a 1% formaldehyde solution for 5 min 
at room temperature while rotating. Crosslinking was stopped by adding glycine 
(0.2 M in 1 x PBS (phosphate buffered saline)) and incubating on ice for 2 min. 
Samples were spun at 2500g for 5 min at 4°C and washed 4 times with 1 x PBS. The 
pellets were flash frozen and stored at —80°C. Cells were lysed, and nuclei were 
isolated and sonicated for 8 min using a Covaris $220 (105 Watts, 2% duty cycle, 


200 cycles per burst) to obtain approximately 200-500 bp chromatin fragments. 
Chromatin fragments were pre-cleared with protein G magnetic beads (New 
England BioLabs) and incubated with pre-bound anti-H3K27me3 (Qiagen), anti- 
EZH2 (eBiosciences), or mouse IgG1 (Santa Cruz Biotechnology) antibody-protein 
G magnetic beads overnight at 4°C. Beads were washed once in low-salt buffer 
(20 mM Tris, pH 8.1, 2mM EDTA, 50mM NaCl, 1% Triton X-100, 0.1% SDS), 
twice in high-salt buffer (20mM Tris, pH 8.1, 2mM EDTA, 500 mM NaCl, 1% 
Triton X-100, 0.1% SDS), once in LiCl buffer (10 mM Tris, pH 8.1, 1mM EDTA, 
0.25mM LiCl, 1% NP-40, 1% deoxycholic acid) and twice in TE buffer (10 mM 
Tris-HCl, pH 8. 0, 1mM EDTA). Washed beads were eluted twice with 100 1l of 
elution buffer (1% SDS, 0.1 M NaHCOs;) and de-crosslinked (0.1 mg ml“! RNase, 
0.3 M NaCl and 0.3 mg ml“! Proteinase K) overnight at 65°C. The DNA samples 
were purified with Qiaquick PCR columns (Qiagen). qPCR was carried out on 
a ViiA7 Real-Time PCR System (ThermoFisher) using the SYBR Green detec- 
tion system and indicated primers. Expression values of target loci were directly 
normalized to the indicated positive control loci, such as MyoD 1 for H3K27me3 
and EZH2 ChIP analysis, and Actb for Pol II ChIP analysis. ChIP-qPCR primer 
sequences are listed in Supplementary Table 1. 

ATAC-seq preparation, sequencing, and analysis. 50,000 wild-type and knock- 
out cells, in triplicate, were spun at 500g for 5 min at 4°C, washed once with 5011 
of cold 1x PBS and centrifuged in the same conditions. Cells were resuspended in 
5011 of ice-cold lysis buffer (10 mM Tris-HCl, pH7.4, 10 mM NaCl, 3mM MgCh, 
0.1% IGEPAL CA-630). Cells were immediately spun at 500g for 10 min at 4°C. 
Lysis buffer was carefully pipetted away from the pellet, which was then resus- 
pended in 501 of the transposition reaction mix (2511 2x TD buffer, 2.511 Tn5 
Transposase (Illumina), 22.5 11 nuclease-free water) and then incubated at 37 °C 
for 30 min. DNA purification was performed using a Qiagen MinElute kit and 
eluted in 121] of Elution buffer (10 mM Tris buffer, pH 8.0). To amplify library 
fragments, 611 of the eluted DNA was mixed with NEBnext High-Fidelity 2x PCR 
Master Mix, 251M of customized Nextera PCR primers 1 and 2 (Supplementary 
Table 1), 100x SYBR Green I and used in PCR as follow: 72°C for 5 min; 98 °C for 
30s; and thermocycling 4 times at 98°C for 10s; 63°C for 30s; 72°C for 1 min. 511 
of the 5 cycles PCR amplified DNA was used in a qPCR reaction to estimate the 
additional number of amplification cycles. Libraries were amplified for a total of 
10-11 cycles and were then purified using a Qiagen PCR Cleanup kit and eluted in 
3011 of Elution buffer. The libraries were quantified using qPCR and bioanalyser 
data, and then normalized and pooled to 2nM. Each 2nM pool was then denatured 
with a 0.1 N NaOH solution in equal parts then further diluted to form a 20 pM 
denatured pool. This pool was then further diluted down to 1.8 pM for sequencing 
using the NextSeq500 machine on V2 chemistry and sequenced on a 1 x 75 bp 
Illumina NextSeq flow cell. 

ATAC sequencing cells was done on Illumina NextSeq at a sequencing depth of 
~40-60 million reads per sample. Libraries were prepared in triplicates. Raw reads 
were deposited under GSE85073. 2 x 75 bp paired-end reads were mapped to the 
mouse mm9 genome using ‘bwa algorithm with ‘mem’ option. Only reads that 
uniquely mapped to the genome were used in subsequent analysis. Duplicate reads 
were eliminated to avoid potential PCR amplification artifacts and to eliminate 
the high numbers of mtDNA duplicates observed in ATAC-seq libraries. Post- 
alignment filtering resulted in ~26-40 million uniquely aligned singleton reads 
per library and the technical replicates were merged into one alignment BAM file 
to increase the power of open chromatin signal in downstream analysis. Depicted 
tracks were normalized to total read depth. ATAC-seq enriched regions (peaks) 
in each sample was identified using MACS2 using the below settings: 

MACS2-2.1.0.20140616/bin/macs2 callpeak -t <input tag file> -f BED -n 
<output peak file> -g 'mm' --nomodel --shift -100 --extsize 200 -B --broad 
ChIP-seq preparation, sequencing and analysis. 10 x 10° wild-type and knock- 
out mice neutrophils were cross-linked in a 1% formaldehyde solution for 10 min 
at room temperature while rotating. Crosslinking was stopped by adding glycine 
(0.2 M in 1 x PBS) and incubating on ice for 2 min. Samples were spun at 2500g 
for 5 min at 4°C and washed 4 times with 1 x PBS. The pellets were flash frozen 
and stored at —80°C. Cells were lysed and sonicated (Branson Sonifier 250) for 
9 cycles (30% amplitude; time, 20s on, 1 min off). Lysates were spun at 18,400g 
for 10 min at 4°C and resuspended in 3 ml of lysis buffer. A sample of 10011 was 
kept aside as input and the rest of the samples were divided by the number of 
antibodies to test. Chromatin immunoprecipitation was performed with 101g of 
antibody-bound beads (anti-H3K27ac, H3K4me3, H3K4mel, H3K36me3 
(Abcam) and anti-rabbit IgG (Santa Cruz), Dynal Protein G magnetic beads 
(Invitrogen)) and incubated overnight at 4°C. Bead-bound DNA was washed, 
reverse cross-linked and eluted overnight at 65 °C, shaking at 950 r.p.m. Beads 
were removed using a magnetic stand and eluted DNA was treated with RNase A 
(0.2;.gpl~!) for 1h at 37°C shaking at 950 r.p.m., then with proteinase K (0.2.g 11) 
for 2h at 55°C. 30,1g of glycogen (Roche) and 5 M of NaCl were adding to the 
samples. DNA was extracted with 1 volume of phenol:chlorofrom:isoamyl alcohol 
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and washed out with 100% ethanol. Dried DNA pellets were resuspended in 3011 
of 10mM Tris HCl, pH 8.0, and DNA concentrations were quantified using Qubit. 
Starting with 10ng of DNA, ChIP-seq libraries were prepared using the KAPA 
Hyper Prep Kit (Kapa Biosystems, Inc.) with 10 cycles of PCR. The libraries were 
quantified using qPCR and bioanalyser data then normalized and pooled to 2nM. 
Each 2nM pool was then denatured with a 0.1 N NaOH solution in equal parts then 
further diluted to form a 20 pM denatured pool. This pool was then further diluted 
down to 1.8 pM for sequencing using the NextSeq500 machine on V2 chemistry 
and sequenced on a 1 x 75 bp Illumina NextSeq flow cell. 

ChIP sequencing was done on an Illumina NextSeq at a sequencing depth of 
~30-40 million reads per sample. Raw reads were deposited under GSE85073. 
75 bp single-end reads were mapped to the mouse mm9 genome using ‘bowtie2’ 
algorithm. Duplicate reads were eliminated to avoid potential PCR amplification 
artifacts and only reads that uniquely mapped to the genome were used in sub- 
sequent analysis. Depicted tracks were normalized to control IgG input sample. 
ChIP-seq-enriched regions (peaks) in each sample was identified using MACS2 
using the below settings: 

MACS2-2.1.0.20140616/bin/macs2 callpeak -t <ChIP tag file> -c <control tag 
file> -f BED -g'mm! --nomodel -extsize=250 --bdg —-broad -n <output peak file> 
RIP-qPCR. 10’ immortalized BMDMs were collected by trypsinization and 
resuspended in 2 ml PBS, 2 ml nuclear isolation buffer (1.28 M sucrose; 40 mM 
Tris-HCl, pH 7.5; 20mM MgCl); 4% Triton X-100), and 6 ml water on ice for 
20 min (with frequent mixing). Nuclei were pelleted by centrifugation at 2,500g 
for 15 min. Nuclear pellets were resuspended in 1 ml RNA immunoprecipitation 
(RIP) buffer (150 mM KCl, 25 mM Tris, pH 7.4, 5mM EDTA, 0.5mM DTT, 0.5% 
NP40; 100U ml“! SUPERaseIn, Ambion; complete EDTA-free protease inhibitor, 
Sigma). Resuspended nuclei were split into two fractions of 50011 each (for mock 
and immunoprecipitation) and were mechanically sheared using a dounce homog- 
enizer. Nuclear membrane and debris were pelleted by centrifugation at 15,800g. 
for 10min. Antibody to EZH2 (Cell Signaling 4905S; 1:30) or normal rabbit IgG 
(mock immunoprecipitation, SantaCruz; 101g) were added to supernatant and 
incubated for 2 hours at 4°C with gentle rotation. 2511 of protein G beads (New 
England BioLabs $1430S) were added and incubated for 1 hour at 4°C with gentle 
rotation. Beads were pelleted by magnetic field, the supernatant was removed, 
and beads were resuspended in 50011 RIP buffer and repeated for a total of three 
RIP buffer washes, followed by one wash in PBS. Beads were resuspended in 1 ml 
of Trizol. Co-precipitated RNAs were isolated, reverse-transcribed to cDNA, and 
assayed by qPCR for the Hprt and Morrbid-isoform1. Primer sequences are listed 
in Supplementary Table 1. 

PAR-CLIP analysis. EZH2 PAR-CLIP dataset (GSE49435) was analysed as 
previously described”. Adapter sequences were removed from total reads and 
those longer than 17 bp were kept. The Fastx toolkit was used to remove dupli- 
cate sequences, and the resulting reads were mapped using BOWTIE allowing 
for two mismatches. The four independent replicates were pooled and analysed 
using PARalyzer, requiring at least two T—C conversions per RNA-protein contact 
site. IncRNAs were annotated according to Ensemble release 67. 

Chromosome conformation capture (3C). 13 x 10° wild-type bone marrow 
derived mouse eosinophils were fixed with 1% formaldehyde for 10 minutes at room 
temperature, and quenched with 0.2 M glycine on ice. Eosinophils were lysed for 
3-4hours at 4°C (50 mM Tris, pH 7.4, 150mM NaCl, 0.5% NP-40, 1% Triton X-100, 
1x Roche complete protease inhibitor) and dounce-homogenized. Lysis was mon- 
itored by Methyl-green pyronin staining (Sigma). Nuclei were pelleted and resus- 
pended in 500,11 1.4x NEB3.1 buffer, treated with 0.3% SDS for one hour at 37°C, 
and 2% Triton X-100 for another hour at 37°C. Nuclei were digested with 800 units 
BglII (NEB) for 22 hours at 37°C, and treated with 1.6% SDS for 25 minutes 
at 65°C to inactivate the enzyme. Digested nuclei were suspended in 6.125 ml 
of 1.25 ligation buffer (NEB), and were treated with 1% Triton X-100 for one 
hour at 37°C. Ligation was performed with 1,000 units T4 DNA ligase (NEB) for 
18 hours at 16°C, and crosslinks were reversed by proteinase K digestion (300 1g) 
overnight at 65 °C. The 3C template was treated with RNase A (300,1g), and puri- 
fied by phenol-chloroform extraction. Digested and undigested DNA were run on 
a 0.8% agarose gel to confirm digestion. To control for PCR efficiency, two bacterial 
artificial chromosomes (BACs) spanning the region of interest were combined in 
equimolar quantities and digested with 500 units BglII at 37°C overnight. Digested 
BACs were ligated with 100 units T4 Ligase HC (Promega) in 60,11 overnight 
at 16°C. Both BAC and 3C ligation products were amplified by qPCR (Applied 
Biosystems ViiA7) using SYBR fast master mix (KAPA biosystems). Products were 
run side by side on a 2% gel, and images were quantified using Image]. Intensity of 
3C ligation products was normalized to intensity of respective BAC PCR product. 
Listeria monocytogenes infections. Mice were infected with 30,000 CFUs of 
Listeria monocytogenes (strain 10403s, obtained as a gift from E. J. Wherry) intrave- 
nously (i.v.). Mice were weighed and inspected daily. Mice were analysed at day 4 of 
infection to determine the CFUs of L. monocytogenes present in the spleen and liver. 
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Papain challenge. Papain was purchased from Sigma Aldrich and resuspended in 
at 1mg ml! in PBS. Mice were intranasally challenged with 5 doses of 201g papain 
in 20,1 of PBS or PBS alone every 24 hours. Mice were killed 12 hours after the last 
challenge. Bronchoalveolar lavage was collected in two 1 ml lavages of PBS. Cellular 
lung infiltrates were collected after 1 hour digestion in RPMI supplemented with 
5% FCS, 1mg ml”! collagenase D (Roche) and 10,.g ml“! DNase I (Invitrogen) 
at 37°C. Homogenates were passed through a cell strainer and infiltrates sepa- 
rated with a 27.5%, Optiprep gradient (Axis-Shield) by centrifugation at 1,175g for 
20 min. Cells were removed from the interface and treated with ACK lysis buffer. 
Bone marrow chimaeras. Congenic C57BL/6 (wild-type) bone marrow expressing 
CD45.1 and CD45.2 and Morrbid-deficient bone marrow expression CD45.2 was 
mixed in a 1:1 ratio and injected into C57BL/6 hosts irradiated twice with 5 Gy 
3 hours apart that express CD45.1 (B6.SJL-Ptprc* Pepc’/BoyJ). Mice were analysed 
between 4-9 weeks after injection. 

Bone-marrow-derived eosinophils. Bone marrow was isolated and cultured as 
previously described’. Briefly, unfractionated bone marrow cells were cultured 
with 100ng ml! stem cell factor (SCE) and 100 ng ml! FLT3-ligand (FLT3-L). At 
day 4, the media was replaced with media containing 10ng ml interleukin (IL-5). 
Mature bone-marrow-derived eosinophils were analysed between day 10-14. 
Bone-marrow-derived macrophage cultures. Bone marrow cells were isolated 
and cultured in media containing recombinant mouse M-CSF (10ng ml!) for 
7-8 days. On day 7-8, cells were re-plated for use in experimental assays. Bone- 
marrow-derived macrophages were stimulated with LPS (250 ng ml~') for the 
indicated periods of time. 

ChIRP-qPCR. Briefly, 40 x 10’ Immortalized bone-marrow-derived macrophages 
were fixed with 40 ml of 1% glutaraldehyde for 10 min at room temperature. 
Crosslinking was quenched with 0.125 M glycine for 5 min. Cells were rinsed 
with PBS, pelleted for 4min at 2,000g, snap-frozen in liquid nitrogen, and stored 
at —80°C. Cell pellets were thawed at room temperature and resuspended in 8001 
of lysis buffer (50mM Tris-HCl, pH 7.0, 10mM EDTA, 1% SDS, 1mM PMSF, 
complete protease inhibitor (Roche), 0.1 U ml”! Superase In (Life Technologies)). 
Cell suspension was sonicated using a Covaris $220 machine (Covaris; 100 W, duty 
factor 20%, 200 cycles per burst) for 60 minutes until DNA was in the size range of 
100-500 bp. After centrifugation for 5 min at 16100 g at 4°C, the supernatant was 
aliquoted, snap-frozen in liquid nitrogen, and stored at —80°C. 1 ml of chromatin 
was diluted in 2 ml hybridization buffer (750 mM NaCl, 1% SDS, 50 mM Tris HCl, 
pH 7.0, 1mM EDTA, 15% formamide) and input RNA and DNA aliquots were 
removed. 100 pmoles of probes (Supplementary Table 1) were added and mixed 
by rotation at 37°C for 4h. Streptavidin paramagnetic C1 beads (Invitrogen) were 
equilibrated with lysis buffer. 10011 washed C1 beads were added, and the entire 
reaction was mixed for 30 min at 37°C. Samples were washed five times with 1 ml 
of washing buffer (SSC 2x, 0.5% SDS and fresh PMSF). 10% of each sample was 
removed from the last wash for RNA isolation. RNA aliquots were added to 85 11 
RNA PK buffer, pH 7.0, (100mM NaCl, 10 mM TrisCl, pH 7.0, 1mM EDTA, 0.5% 
SDS, 0.2 Ul“ proteinase K) and incubated for 45 min with end-to-end shaking. 
Samples were spun down, and boiled for 10 min at 95°C. Samples were chilled on 
ice, added to 50011 TRizol, and RNA was extracted according to the manufacturer's 
recommendations. Equal volume of RNA was reverse-transcribed and assayed by 
qPCR using Hprt and Morrbid-exon1-1 primer sets (Supplementary Table 1). DNA 
was eluted from remaining bead fraction twice using 150,11 DNA elution buffer 
(50mM NaHCOs, 1%SDS, 200mM NaCl, 100j1g ml! RNase A, 100 U ml“! RNase H) 
incubated for 30 min at 37°C. DNA elutions were combined and treated with 15 jul 
(20 mg ml) Proteinase K for 45 min at 50°C. DNA was purified using phenol: 
chloroform:isoamy] and assayed by qPCR using the indicated primer sequences 
(Supplementary Table 1). 

shRNA generation and transduction. shRNAs of indicated sequences 
(Supplementary Table 1) were cloned into pGreen shRNA cloning and expres- 
sion lentivector. Psuedotyped lentivirus was generated as previously described, 
and 293T cells were transfected with a packaging plasmid, envelop plasmid, 
and the generated shRNA vector plasmid using Lipofectamine 2000. Virus was 
collected 14-16h and 48h after transfection, combined, 0.4-\1m filtered, and stored 
at —80°C. For generation of in vivo BM chimaeras, virus was concentrated 6 times 
by ultracentrifugation using an Optiprep gradient (Axis-Shield). 

For transduced BM-derived eosinophils, cultured BM cells on day 3 of previ- 
ously described culture conditions were mixed 1:1 with indicated lentivirus and 
spinfected for 2h at 260g at 25°C with 5g ml! polybrene. Cultures were incu- 
bated overnight at 37°C, and media was exchanged for IL-5 containing media at 
day 4 of culture as previously described’. Cells were sorted for GFP* cells on day 
5 of culture, and then cultured as previously described for eosinophil generation. 
Cells were assayed on day 11 of culture. 

For transduced in vivo BM chimaeras, BM cells were cultured at 2.5 x 10° cells 
per ml in mIL-3 (10ng ml~!), mIL-6 (5ng ml!) and mSCF (100 ng ml!) over- 
night at 37°C. Culture was readjusted to 2 ml at 2.5 x 10° cells per ml in a 6-well 
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plate, and spinfected for 2h at 260g at 25°C with 5,.g ml! polybrene. Cells were 
incubated overnight at 37 °C. On the day before transfer, recipient hosts were irra- 
diated twice with 5 Gy 3 hours apart. Mice were analysed between 4 and 5 weeks 
following transfer. 

Locked nucleic acid knockdown. Bone marrow-derived macrophages (BMDMs) 
were transfected with pooled Morrbid or scrambled locked nucleic acid (LNA) 
antisense oligonucleotides of equivalent total concentrations using Lipofectamine 
2000. Morrbid LNA pools contained Morrbid LNA 1-4 sequences at a total of 50 or 
100 nM (Supplementary Table 1). After 24h, the transfection media was replaced. 
The BMDMs were incubated for an additional 24h and subsequently stimulated 
with LPS (250ng ml~!) for 8—12h. 

Eosinophils were derived from mouse BM as previously described. On day 12 
of culture, 1 x 10° to 2 x 10° eosinophils were transfected with 50nm of Morrbid 
LNA 3 or scrambled LNA (Supplementary Table 1) using TransIT-oligo according 
to manufacturer’s protocol. RNA was extracted 48 h after transfection. 

Morrbid promoter deletion. Guide RNAs (gRNAs) targeting the 5’ and 3’ flanking 
regions of the Morrbid promoter were cloned into Cas9 vectors pSPCas9(BB)- 
2A-GFP(PX458) (Addgene plasmid 48138) and pSPCas9(BB)-2A-mCherry 
(a gift from the Stitzel lab, JAX-GM) respectively. gRNA sequences are listed in 
Supplementary Table 1. The cloned Cas9 plasmids were then transfected into 
RAW 264.7, a mouse macrophage cell line using Lipofectamine 2000, according 
to manufacturer's protocol. Forty-eight hours post transfection the double positive 
cells expressing GFP and mcherry, and the double negative cells lacking GFP and 
mcherry were sorted. The bulk sorted cells were grown in a complete media con- 
taining 20% FBS, assayed for deletion by PCR, as well as for Morrbid and Bcl2I11 
transcript expression by qPCR. 

Ex vivo cytokine stimulation. BM-derived eosinophils, or neutrophils or Ly6Ch 
monocytes sorted from mouse BM, were rested for 4—6 hours at 37°C in complete 
media. Cells were subsequently stimulated with IL-3 (10 ng ml’, Biolegend), IL-5 
(10ng ml !, Biolegend), GM-CSF (10ng ml" !, Biolegend), or G-CSF (10ng ml}, 
Biolegend) for 4-6h. RNA was collected at each time-point using TRIzol (Life 
Technologies). 

GSK126 treatment. Wild-type and Bcl2111~'~ BM-derived eosinophils were gen- 
erated as previously described’. On day 8 of culture, the previously described IL-5 
media was supplemented with the indicated concentrations of the EZH2-specific 
inhibitor GSK126 (Toronto Research Chemicals). Media was exchanged for fresh 
IL-5 GSK126 containing media every other day. Cells were assayed for numbers 
and cell death by flow cytometry every day for 6 days following GSK126 treatment. 
RNA extraction, cDNA synthesis and quantitative RT-PCR. Total RNA was 
extracted from TRIzol (Life Technologies) according to the manufacturer's instruc- 
tions. Gycogen (ThermoFisher Scientific) was used as a carrier. Isolated RNA 
was quantified by spectophotemetry, and RNA concentrations were normalized. 
cDNA was synthesized using SuperScript II Reverse Transcriptase (ThermoFisher 
Scientific) according to the manufacturer's instructions. Resulting cDNA was ana- 
lysed by SYBR Green (KAPA SYBR Fast, KAPABiosystems) or Taqman-based 
(KAPA Probe Fast, KAPABiosystems) using indicated primers. Primer sequences 
are listed in Supplementary Table 1. All reactions were performed in duplicate 
using a CFX96 Touch instrument (BioRad) or ViiA7 Real-Time PCR instrument 
(ThermoFischer Scientific). 

RNA-seq and conservation analysis. Reads generated from mouse (Gr1*) gran- 
ulocytes (previously published GSE53928), human neutrophils (previously pub- 
lished GSE70068), and bovine peripheral blood leukocytes (previously published 
GSE60265) were filtered, normalized, and aligned to the corresponding host 
genome. Reads mapping around the Morrbid locus were visualized. For visual- 
ization of the high level of Morrbid expression in short-lived myeloid cells, reads 
from sorted mouse eosinophils (previously published GSE69707), were filtered, 
aligned to mm9, normalized using RPKM, and gene expression was plotted in 
descending order. For each human sample corresponding to the indicated stim- 
ulation conditions, the number of reads mapping to the human MORRBID locus 
per total mapped reads was determined. 

For conservation across species, the genomic loci and surrounding genomic 
regions for the species analysed were aligned with mVista and visualized using 
the rankVista display generated with mouse as the reference sequence. Green 
highlights annotated mouse exonic regions and corresponding regions in other 
indicated species. 

RNA fluorescence in situ hybridization. Single molecule RNA fluorescence in situ 
hybridization (FISH) was performed as previously described. A pool of 44 oligo- 
nucleotides (Biosearch Technologies) were labelled with Atto647N (Atto-Tec). For 
validation purposes, we also labelled subsets consisting of odd and even numbered 
oligonucleotides with Atto647N and Atto700, respectively, and looked for colocal- 
ization of signal. We designed the oligonucleotides using the online Stellaris probe 
design software. Probe oligonucleotide sequences are listed in Supplementary 
Table 1. Thirty Z-sections with a 0.3-\um spacing were taken for each field of view. 


We acquired all images using a Nikon Ti-E widefield microscope with a 100 x 
1.4NA objective and a Pixis 1024BR cooled CCD camera. We counted the mRNA 
in each cell by using custom image processing scripts written in MATLAB. 
Cell fractionation. For nuclear and cytoplasmic fractionation, 5 x 10° BMDMs 
were stimulated with 250ng ml! LPS for 4 hours. Cells were collected and washed 
once with cold PBS. Cells were pelleted, resuspended in 10011 cold NAR A buffer 
(10mM HEPES, pH 7.9, 10mM KCI, 0.1 mM EDTA, 1x complete EDTA-free 
protease inhibitor, Sigma; 1 mM DTT, 20mM §-glycerophasphate, 0.1 Up! 
SUPERaseln, Life Technologies), and incubated at 4°C for 20 min. 10 1l 1% NP-40 
was added, and cells were incubated for 3 min at room temperature. Cells were 
vortexed for 30 seconds, and centrifuged at 3,400g. for 1.5 min at 4°C. Supernatant 
was removed, centrifuged at full speed for 90 min at 4°C, and remaining superna- 
tant was added to 500 11 Trizol as the cytoplasmic fraction. The original pellet was 
washed 4 times in 100,11 NAR A with short spins of 6,800g. for 1 min. The pellet 
was resuspended in 50,11 NAR C (20mM HEPES, pH 7.9, 400 mM NaCl, 1 mM 
EDTA, 1x complete EDTA-free protease inhibitor, Sigma, 1mM DTT, 20mM 
8-glycerophasphate, 0.1 Ul” ' SUPERaselIn, Life Technologies). Cells were vor- 
texed every 3 min for 10s for a total of 20 min at 4°C. The sample was centrifuged 
at maximum speed for 20 min at room temperature. Remaining supernatant was 
added to 5001 Trizol as the nuclear fraction. Equivalent volumes of cytoplasmic 
and nuclear RNA were converted to cDNA using gene specific primers and Super 
Script II RT (Life Technologies). Fraction was assessed by qPCR for Morrbid- 
exon1-1 and other known cytoplasmic and nuclear transcripts. Primer sequences 
are listed in Supplementary Table 1. 

For cytoplasmic, nuclear, and chromatin fractionation, cell fractions 5 x 10° to 
10 x 10° immortalized macrophages were activated with 250ng ml! LPS (Sigma) 
for 6 hours at 37°C. Cells were washed 2 x with PBS, and then resuspended in 380i 
ice-cold HLB (50 mM Tris-HCl, pH7.4, 50mM NaCl, 3mM MgCh, 0.5% NP-40, 
10% glycerol), supplemented with 100 U SUPERase In RNase Inhibitor (Life 
Technologies). Cells were vortexed 30s and incubated on ice for 30 min, followed 
by a final 30 s vortex and centrifugation at 4°C for 5min x 1000g. Supernatant was 
collected as the cytoplasmic fraction. Nuclear pellets were resuspended by vortex- 
ing in 38011 ice-cold MWS (50 mM Tris-HCl, pH7.4, 4mM EDTA, 0.3 M NaCl, 
1M urea, 1% NP-40) supplemented with 100 U SUPERase in RNase Inhibitor. 
Nuclei were lysed on ice for 10 min, vortexed for 30s, and incubated on ice for 10 
more min to complete lysis. Chromatin was pelleted by centrifugation at 4°C for 
5 min x 1000g. Supernatant was collected as the nucleoplasmic fraction. RNA was 
collected as described previously and cleaned up using the RNeasy kit (Qiagen). 
Equivalent volumes of cytoplasmic, nucleoplasmic, and chromatin-associated 
RNA were converted to cDNA using random hexamers and Super Script III RT 
(Life Technologies). Fraction was assessed by qPCR for Morrbid-exon1-2 and 
other known cytoplasmic and nuclear transcripts. Primer sequences are listed in 
Supplementary Table 1. 
Copy number analysis. Morrbid cDNA was cloned into reference plasmid 
(pCDNA3.1) containing a T7 promoter. The plasmid was linearized and 
Morrbid RNA was in vitro transcribed using the MEGAshortscript T7 kit (Life 
Technologies), according to the manufacturer’s recommendations, and purified 
using the MEGAclear kit (Life Technologies). RNA was quantified using spec- 
trophotometry and serial dilutions of Morrbid RNA of calculated copy number 
were spiked into Morrbid-deficient RNA isolated from Morrbid-deficient mouse 
spleen. Samples were reverse transcribed in parallel with wild-type-sorted neutro- 
phil RNA and B-cell RNA isolated from known cell number using gene-specific 
Morrbid primers, and the Morrbid standard curve and wild-type neutrophils and 
B cells were assayed using qPCR with Morrbid-exon 1 primer sets (Supplementary 
Table 1) 
Bromodeoxyuridine incorporation assay. Cohorts of mice were given a total of 
4mg bromodeoxyuridine (BrdU; Sigma Aldrich) in 2 separate intraperitoneal (i-p.) 
injections 3h apart and monitored over the subsequent 5 days, unless otherwise 
noted. For analysis cells were stained according to manufacturer protocol (BrdU 
Staining Kit, ebioscience; anti-BrdU, Biolgend). A one-phase exponential curve 
was fitted from the peak labelling frequency to 36h after peak labelling within each 
genetic background, and the half-life was determined from this curve. 
Human samples. Human subject cohort 1. Study subjects were recruited and con- 
sented in accordance with the University of Pennsylvania Institutional Review 
Board. Peripheral blood was separated by Ficoll-Paque density gradient cen- 
trifugation, and the mononuclear cell layer and erythrocyte/granulocyte pellet 
were isolated and stained for fluorescence-associated cell sorting as previously 
described. Neutrophils (live, CD16*F4/80'"CD3~- CD14 CD19), eosino- 
phils (live, CD16~F4/80"CD3-CD14~CD19~), T cells (live, CD3+CD16-), 
monocytes (live, CD14*CD3~ CD16 CD56_ ), natural killer (NK) cells (live, 
CD56*CD3- CD16 CD14), B cells (live, CD19*CD3~ CD16 CD14 CD56_). 
Human subject cohort 2. Samples from human subjects were collected on NIAID 
IRB-approved research protocols to study eosinophilic disorders (NCT00001406) 
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or to provide controls for in vitro research (NCT00090662). All participants gave 
written informed consent. Eosinophils were purified from peripheral blood by 
negative selection and frozen at -80°C in TRIzol (Life Technologies). Purity was 
>97% as assessed by cytospin. RNA was purified according to the manufacturer’s 
instructions. Expression analysis by qPCR was performed in a blinded manner by 
an individual not involved in sample collection or coding of these of these samples. 
Plasma IL-5 levels were measured by suspension array in multiplex (Millipore). 
The minimum detectable concentration was 0.1 pg ml“. 

Cell lines. RAW 264.7 cells were obtained from ATCC and were not authenti- 
cated, but were tested for mycoplasma contamination biannually. Immortalized 
C57/B6 macrophages were obtained as a generous gift from I. Brodsky. These 
cells were not authenticated, but were tested for mycoplasma contamination 
biannually. 

Statistics. Samples sizes were estimated based on our preliminary phenotyping 
of Morrbid-deficient mice. Preliminary cell number analysis of eosinophils, neu- 
trophils, and LyoC*! monocytes suggested that there were very large differences 
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between wild-type and Morrbid-deficient samples, which would allow statistical 
interpretation with relatively small numbers and no statistical methods were used 
to predetermine sample size. No animals were excluded from analysis. All exper- 
imental and control mice and human samples were run in parallel to control for 
experimental variability. The experiments were not randomized. Experiments 
corresponding to Fig. 3g—i and Fig. 4g-j were performed and analysed in a single- 
blinded manner. All other experiments were not blinded to allocation during 
experiments and outcome assessment. Correlation was determined by calculating 
the Spearman correlation coefficient. Half-life was estimated by calculating the 
one-phase exponential decay constant from the peak of labelling frequency to 
36h after peak labelling. P values were calculated using a two-way t-test, Mann- 
Whitney U-test, one-way ANOVA with Tukey post-hoc analysis, Kaplan-Meier 
Mantel-Cox test, and false discovery rate (FDR) as indicated. FDR was calcu- 
lated using trimmed mean of M-values (TMM)-normalized read counts and the 
DiffBind R package as described in Extended Data Fig. 7c, d. All error bars indicate 
mean plus and minus the standard error of mean (s.e.m.). 
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Extended Data Figure 1 | Morrbid transcript expression, localization, 
and conservation across species. a, Left: mouse, human and cow 
Morrbid transcripts. Human neutrophil, mouse granulocyte and cow 
peripheral blood RNA-seq data are represented as read density around 
the Morrbid transcript of each species. Right: the Morrbid loci and 
surrounding genomic regions of the indicated species were aligned 

with mVista and visualized using the rankVista display generated with 
mouse as the reference sequence. Green highlights annotated mouse 
exonic regions and corresponding regions in other indicated species. 

b, Quantification of Morrbid FISH spots per indicated cell population. 
Cells were stained with Morrbid RNA probes conjugated to 2 different 
fluorophores, and spots colocalizing in both fluorescent channels were 
quantified. c, Cytoplasmic and nuclear subcellular RNA fractionation 

of LPS-stimulated BMDMs with qPCR of indicated target transcripts 
(n=3 macrophages generated from independent mice). d, Cytoplasmic, 
nuclear and chromatin subcellular RNA fractionation of LPS-stimulated 
immortalized BMDMs with qPCR of indicated target transcripts 
(average of 4 independent experiments). e, Mature eosinophil 
transcriptome sorted in descending order of log(/RPKM) gene expression, 


LETTER 


with annotated select reported eosinophil-associated genes. f, Average 
number of Morrbid RNA copies per cell in sorted neutrophils and B cells. 
Left: standard curve generated using in vitro transcribed Morrbid RNA 
spiked into Morrbid-deficient RNA isolated from spleen. Right: calculated 
per cell Morrbid RNA copies (n =3 replicates from independent mice). 
g, Representation of CRISPR-Cas9 targeting of the Morrbid locus with 
indicated guide RNA (gRNA) sequences and genotyping primer sets. 
Target gRNA sequences are bolded. h, Cells isolated from the blood of 
wild-type mice. Representative flow cytometry plots demonstrating 

the gating strategy for neutrophils (CD45*CD11b*LY6G"), T cells 
(CD45*Ly6G~ CD3*), B cells (CD45*Ly6G~ CD3~ CD 19+), eosinophils 
(CD45*+CD3~CD19~Ly6G" SiglecF+ SSC"), Ly6C"! monoctyes 
(CD45+*CD3~ CD19" Ly6G~ SSC"*SiglecF~ Ly6Ch!CSF-1R*), NK cells 
(CD45*+*CD3~ CD19 Ly6G~ SSC"SiglecF~ CSF-1R-NK1.1*). i, Total cell 
numbers of the indicated cell populations isolated from the spleen of 
wild-type and Morrbid-deficient mice (n = 3-5 mice per group, results 
representative of 8 independent experiments). Error bars show s.e.m. 
"P<0.05, "P<0.01, and "P< 0.001 (two-sided t-test, c, f, i; one-way 
ANOVA with Tukey post-hoc analysis, d). 
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Extended Data Figure 2 | Myeloid cell populations in tissue following 
Morrbid deletion, and blood and spleen following Morrbid knockdown 
in vivo. a, Representative flow cytometry plots and absolute counts of 

the indicated cell populations in wild-type and Morrbid-deficient mice 
(n=3—5 mice per group, representative of 3-7 independent experiments). 
b, shRNA knockdown of Morrbid RNA relative to control vector in BM- 
transduced with the indicated GFP vector, sorted on GFP, differentiated 
into eosinophils and assessed by qPCR (each dot represents eosinophils 
generated from independent mice). c, Schematic of control and Morrbid 
shRNA1 BM chimaera generation. d, e, Frequency of indicated cell 


populations within total GFP* transduced cells from blood (d) and spleen 
(e) (n =3-4 mice per transduction group). f-h, Wild-type and Morrbid- 
deficient mice challenged with papain or PBS. f, Absolute numbers of 
indicated cell populations in lung tissue and broncholalveolar lavage 
(BAL). g, qPCR expression in lung tissue. h, Representative haematoxylin 
and eosin (H&E) and periodic acid—Schiff (PAS) stain lung histology 

at 40x magnification (n= 3-4 mice per group; representative of two 
independent experiments). Error bars show s.e.m. “P< 0.05, “"P< 0.01, 


Be 


and “P< 0.001 (two-sided t-test, a, b, d, e; Mann-Whitney U-test, f, g). 
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Extended Data Figure 3 | See next page for caption. 
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Extended Data Figure 3 | Morrbid regulation of mature neutrophils, 
eosinophils and Ly6C" monocytes is cell intrinsic. a~e, Morrbid- 
deficient competitive BM chimaera generation. a, Schematic of mixed BM 
chimaera generation. Congenically labelled wild-type CD45.1*CD45.2* 
and Morrbid-deficient CD45.2* BM cells were mixed 1:1 and injected into 
an irradiated CD45.1* host. b, Ratio of mixed congenically labelled 
wild-type CD45.1*CD45.2* and Morrbid-deficient CD45.2* BM cells 
before injection into an irradiated CD45.1* host. c, d, Ratio of Morrbid- 
deficient to wild-type short-lived myeloid and control immune cells in blood 
(c) and representative flow cytometry plots of these cell populations (d). 

e, Morrbid-deficient to wild type ratio of additional immune cell 
populations (n= 4-8 mice per group; pooled from two independent 
experiments). f, Schematic of myeloid differentiation and Morrbid qPCR 
expression in the indicated sorted progenitor and mature cells (n= 3-5 
mice per group; representative of 3 independent experiments). g, Cells 
isolated from the BM of wild-type mice. Representative flow cytometry 


plots demonstrating the gating strategy for common myeloid progenitor 
(CMP): lineage (Scal, CD11b, GR-1, CD3, Ter-119, CD19, B220, NK1.1), 
IL7Ra” C-kit*CD34*+CD16/32'°/ granulocyte/monocyte progenitor 
(GMP): lineage IL7Ra~ C-kit* CD34+CD16/32"; monocyte/dendritic cell 
progenitor (MDP): lineage" IL7Ra~ C-kit*CD115+CD135*; eosinophil 
progenitor (EosP): lineage IL7Ra~ C-kit*CD34*CD16/32™IL-5Ra’. 

h, Cells isolated from the BM of wild-type mice. Representative flow 
cytometry plots demonstrating the gating strategy for eosinophils: 
dump (dump: CD3, NKp46, Ter119, CD19, Ly6G, Scal1), CSF-1R~C- 
kit~/°SiglecF+ SSC"; monocytes: dump~ CSF-1R*+C-kit- MHCII- Ly6C; 
common monocyte progenitor (CMoP): dump” CSF-1R*C- 
kittLy6CCD11b". Flow cytometry count beads are visualized and gated 
by forward and side scatter area (g, h). Error bars show s.e.m. "P< 0.05, 
“P<0.01, and “P< 0.001 (one-way AVONA with Tukey post-hoc test 
analysis). 
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Extended Data Figure 4 | See next page for caption. 
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Extended Data Figure 4 | Morrbid regulates neutrophil, eosinophil and 
Ly6C"™ monocyte lifespan through cell-intrinsic regulation of Bcl2I11. 
a, Flow cytometric analysis of percentage of BrdU incorporation in the 
indicated wild-type and Morrbid-deficient immune cell populations from 
blood. Mice were analysed 24h after one dose of 2mg BrdU (n=3 mice 
per group). b, Representative flow cytometry plots and absolute counts 
of mature eosinophils (live, CD45+SSC™CD11b*Siglec Ft) of BM- 
derived eosinophil culture on day 12 in wild-type and Morrbid-deficient 
mice (n =3 mice per group, results representative of 3 independent 
experiments). c, Morrbid expression of developing wild-type BM- 
derived eosinophils at indicated time points of in vitro culture (n =3 
mice per group). d, Percentage of annexin V* wild-type and Morrbid- 
deficient BM cell populations at indicated time points of ex vivo culture 
(n=3 mice per group; data are representative of two independent 
experiments). e, Percentage of annexin V~ eosinophils (gated on annexin- 
V+CD45*SSC"CD11b*SiglecF*) of BM-derived eosinophil culture on 
day 12 in wild-type and Morrbid-deficient mice (n = 3 mice per group, 
results representative of 3 independent experiments). f, Percentage of 
annexin V* wild-type and Morrbid-deficient neutrophils and Ly6C™ 
monocytes 4 days after L. monocytogenes infection (n =3 mice per 

group, representative of 2 independent experiments). g, Flow cytometric 


analysis of percentage and absolute number of blood neutrophils from 
wild-type or Morrbid-deficient mice that were pulsed two times with 

2 mg BrdU 3h apart and monitored over 5 days (n = 4 mice per group; 
data are representative of three independent experiments). h, Western 
blot analysis of BCL2L11 protein expression in wild-type and Morrbid- 
deficient sorted BM neutrophils. i, BCL2L11 protein expression measured 
by flow cytometry in blood neutrophils from wild-type, Morrbid-deficient 
and Bcl2111-deficient mice (n = 1-4 mice per group). j, k, BCL2L11 
protein expression in mixed BM chimaera model. Quantification of 

mean fluorescence intensity (MFI) of BCL2L11 protein expression in 
indicated cell populations from blood (j) and BM (k) (n= 4-8 mice 

per group, results representative of two independent experiments). 

1, BCL2L11 protein expression in the indicated progenitors and mature 
cell types from wild-type and Morrbid-deficient mice. ‘n/a indicates 

that too few cells were present for MFI quantification (n = 3-5 mice per 
group, results representative of 3 independent experiments). m, BCL2L11 
expression measured in the indicated cell populations from wild-type and 
Morrbid-deficient mice (n= 3, results representative of two independent 
experiments). Error bars show s.e.m. “P<0.05, -P<0.01, and~P<0.001 
(two-sided t-test). 
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Extended Data Figure 6 | See next page for caption. 
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Extended Data Figure 6 | Knockdown of Morrbid leads to Bcl2I11 
upregulation and cell death. a, Schematic of shRNA-transduced BM- 
derived eosinophil system. b-d, In vitro shRNA BM-derived eosinophil 
competitive chimaera. b, Schematic of transduction of CD45.2+ and 
CD45.1* BM cells transduced with GFP scrambled shRNA or GFP 
Morrbid-specific shRNA lentiviral vectors, respectively. GFP* cells 

were sorted, mixed 1:1, differentiated into eosinophils, and analysed by 
flow cytometry. c, Representative histogram and MFI quantification of 
BCL2L11 expression of mature eosinophils separated by congenic marker. 
d, Percentage of contribution of each congenic BM to the total mature 
eosinophil pool (1 =3 mice per group, each dot represents eosinophils 
differentiated from the BM of 1 mouse, representative of 2 independent 
experiments). e, Morrbid and Bcl2I11 expression of wild-type BM-derived 
eosinophils transfected with Morrbid-specific LNA 3 and control LNA 
(each dot represents the average of 2-3 biological replicates, data pooled 
from 5 independent experiments). f, Morrbid and Bcl2I11 expression of 
wild-type and Morrbid-deficient BM-derived macrophages at the indicated 
time points following LPS stimulation. Expression is represented as 

fold change from time 0 (fo) (n =3 mice per group, representative of 3 
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independent experiments). g—i, LPS-stimulated BM-derived macrophages 
transfected with pooled Morrbid-specific (LNA 1-4) or scrambled 

(cntrl LNA) antisense LNAs. g, Morrbid and Bcl2I11 qPCR expression; 

h, Annexin Vt expression; i, absolute BM-derived macrophage numbers 
(n=3 mice per group, representative of 6 independent experiments). 

j-l, Morrbid promoter deletion in immortalized BMDMs. j, Diagram of 
Morrbid promoter targeting in immortalized BMDMs using CRISPR- 
Cas9. Immortalized BMDMs were transfected with GFP-expressing 

Cas9 and Cherry-expressing gRNA vectors of the indicated sequences. 

k, 1, GFP*/Cherry* and GFP~/Cherry~ expressing cells were sorted and 
assayed at the bulk level using PCR for verification of promoter deletion 
using the indicated primers (j, k) and qPCR for Morrbid and Bcl2I11 
expression following LPS stimulation for 6 hours (1) (n = 3 LPS-stimulated 
cultures, average of 3 independent experiments). m, Morrbid and 

Bcl2I11 transcript expression in wild-type and Morrbid-deficient sorted 
BM-derived neutrophils stimulated with G-CSF for 4h. Expression is 
represented as fold change from unstimulated (n =3 mice, representative 
of 2 independent experiments). Error bars show s.e.m. "P< 0.05, 


“P<0.01, and "P< 0.001 (two-sided t-test). 
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Extended Data Figure 7 | See next page for caption. 
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Extended Data Figure 7 | Epigenetic effect of Morrbid deletion on 

its surrounding genomic region. a, ChIP-qPCR analysis of total Pol 

II enrichment within the Bcl2/11 promoter and gene body in wild- 

type and Morrbid-deficient neutrophils. Results are represented as 

Bcl2I11 enrichment relative to control Actb enrichment within each 
sample. Each dot represents 1-2 pooled mice. b, ChIP-qPCR analysis 

of EZH2 enrichment within the Bcl2/11 promoter in wild-type and 
Morrbid-deficient BMDMs stimulated with LPS for 12 hours. Results are 
represented as Bcl2/11 enrichment relative to control MyOD1 enrichment 
within each sample (n = 3, each dot represents BMDMs generated from 

1 mouse). ¢, Relative chromatin accessibility levels at the Bcl2111, Acoxl, 
Anapcl and Mertk promoters in Morrbid~'~ and wild-type neutrophils as 
assessed by ATAC-seq. Chromatin accessibility levels were estimated as an 
average trimmed mean of M-values (TMM)-normalized read count across 
the replicates. Statistics were obtained by differential open chromatin 
analysis using the DiffBind R package. The Bcl2I11 promoter is more open 
in Morrbid~'~ neutrophils with a 1.52-fold change with a FDR of < 0.1%. 
ND (not detected) indicates that no peak was present at the indicated 
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promoter. d, Density plot of log, fold-change distribution for H3K4mel, 
H3K4me3, H3K27ac and H3K36me3 levels between Morrbid~'~ and wild- 
type neutrophils. Relative fold changes are estimated as the ratio of TMM- 
normalized read counts within consensus peak regions and were obtained 
using the DiffBind R package. Positive and negative fold changes indicate 
higher levels of ChIP binding in Morrbid~'~ and wild-type neutrophils, 
respectively. Dashed green lines show the 5th and 95th percentiles. The 
green triangles on the x axis mark the change at the Bcl2/11 promoter or 
gene body between wild-type and Morrbid~'~ neutrophils. e, f, ATAC- 
seq and ChIP-seq for H3K4mel1, H3K4me3, H3K27ac and H3K36me3 
chromatin modifications were performed on neutrophils sorted from 

the bone marrow of wild-type and Morrbid-deficient mice. ATAC-seq 
and ChIP-seq are represented as read density surrounding the Morrbid 
locus (e) and at the Bcl2111 locus (f). ATAC-seq tracks are expressed as 
reads normalized to total reads, and chromatin modification tracks are 
expressed as reads normalized to input. Error bars show s.e.m. "P< 0.05, 
“P <0.01, and "P< 0.001 (two-sided t-test, a, b; FDR of fold change as 
described above, c, d). 
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Extended Data Figure 8 | See next page for caption. 
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Extended Data Figure 8 | Morrbid represses Bcl2111 by maintaining 
its bivalent promoter in a poised state and phenotype of Morrbid 
heterozygous mice. a, Venn diagram summary of EZH2 PAR-CLIP 
analysis, with representation of tags and RNA-protein contact sites as 
determined by PARalyzer mapping to Morrbid. RNA contact sites (RCS) 
are displayed in red. b, Co-immunoprecipitation of the PRC2 family 
member EZH2 and Morrbid. Nuclear extracts of immortalized wild-type 
BMDMs stimulated with LPS for 6-12 h were immunoprecipitated by 
IgG or anti-EZH2 antibodies. Co-precipitation of indicated RNAs were 
assayed by qPCR. Data are represented as enrichment over IgG control 
(n=6 biological replicates pooled from 2 independent experiments, 
representative of 3 independent experiments). c, Validation of Morrbid 
RNA pull-down over other RNAs using pools of Morrbid capture 
probes and LacZ probes (n = 3, average of 3 independent experiments). 
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d, Visualized 3C PCR products from bait and indicated reverse primers 
using template from fixed and ligated BM-derived eosinophil DNA (S1, S2 
and $3), BAC control (BAC) or water. The sequence of each reverse primer 
is listed in Supplementary Table 1. e, f, BM-derived eosinophils from wild- 
type and Bcl2I11~/~ mice treated with EZH2 inhibitor GSK126 over time. 
Frequency of non-viable (Aqua) (e) and annexin V (f) staining cells on 
day 5 following treatment with GSK126 (n = 3 independently differentiated 
eosinophils per dose, results representative of 2 independent experiments). 
g, Total cell numbers (top) and BCL2L11 protein expression (bottom) 

of indicated cell populations from the blood of wild-type, Morrbid- 
heterozygous and Morrbid-deficient mice (n = 3-5 mice per group, results 
representative of 3 independent experiments). Error bars show s.e.m. 
“P<0.05, P<0.01, and” P<0.001 (two-sided t-test, c, g; one-way 
ANOVA with Tukey post-hoc analysis, e, f; Mann-Whitney U-test, b). 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


Bcl2!11 


ATG 


—> bp — 


Allele 1: 
Morrbid A 


gRNA1 gRNA2 


Allele 2; | sep fp mp | 
wT kas gRNA2 


Bel2/11 Morrbid 
F1in Cis: | S Flin Trans: | J. 
Bel2i11 +4 x WT 3 Bolait11 +A x wr 8s 
Morrbid +,A ig 2 Morrbid A,+ Is & 
= = =s 


—_2a Morrbid WT —> 


—<—_ Morrbid A — > 


—<m Bel2i11WT — > 


<mm—__ Bel2/11 A —> 


QRNA1: 5’ ACTCTAGAATTCTTAACAAAGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAG 
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QRNA2: 5’ AAGTTTGTGTCGTGAATTTAGTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAG 
TCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTTTTT 3’ 


— <—  Bcl2/11 primers: 
5’ TCCGATTTAGTTCCACTCTGC 3’; 5’ TTCCTTTTACATGCCTGGTG3’ 


Extended Data Figure 9 | Generation of Morrbid-Bcl2111 double 
heterozygous mice. Diagram of allele specific CRISPR-Cas9 targeting of 
Bcl2111. Bcl2111 was targeted using indicated gRNA sequences in one-cell 
embryos from a wild-type by Morrbid-deficient breeding. F1 mice with 
allele-specific Bcl2111 deletions in cis or in trans of the Morrbid-deficient 


allele were bred to a wild-type background to demonstrate linkage or 
segregation of Bcl2]11 and Morrbid knockout alleles. Second-rightmost 
lanes of both gels contain Morrbid~'~ Bcl2111*/* DNA, and rightmost 
lanes contain water, as internal controls. 
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Extended Data Figure 10 | See next page for caption. 
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Extended Data Figure 10 | Morrbid regulates Bcl2111 in an allele- 
specific manner and working model of the role of Morrbid. a, Diagram 
of the allele-specific combinations of Morrbid- and Bcl2I11-deficient 
heterozygous mice studied. b, Representative flow cytometry plots of 
indicated splenic cell populations in the specified allele-specific deletion 
genetic backgrounds. Neutrophils (CD45*CD11b*LY6G*), monoctyes 
(CD45+CD3~ CD19 Ly6G" SSC"SiglecF- Ly6C'\CSF-1R*) and B cells 
(CD45*Ly6G~ CD3~ CD19*). Wild-type (WT), Morrbid heterozygote 
(Het), Bcl2111 heterozygote and Morrbid heterozygote with deletions 

in trans (Trans), Bcl2111 heterozygote and Morrbid heterozygote with 
deletions in cis (Cis). c, d, Absolute counts (c) and BCL2L11 protein 


expression (d) of indicated splenic cell populations in the specified genetic 
backgrounds (n = 3-9 mice per genetic background). e, Morrbid integrates 
extracellular signals to control the lifespan of eosinophils, neutrophils 

and classical monocytes through the allele-specific regulation of Bcl2I11. 
Pro-survival cytokines induce Morrbid, which promotes enrichment of the 
PRC2 complex within the bivalent Bcl2/11 promoter through direct and 
potentially indirect mechanisms to maintain this gene in a poised state. 
Tight control of the turnover of these short-lived myeloid cells by Morrbid 
promotes a balance of host anti-pathogen immunity with host damage 
from excess inflammation. Error bars show s.e.m. “P< 0.05, **P< 0.01, 


sek 


and “P< 0.001 (one-way ANOVA with Tukey post-hoc analysis). 
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PionX sites mark the X chromosome for dosage 


compensation 


Raffaella Villa!, Tamas Schauer!, Pawel Smialowski2, Tobias Straub? & Peter B. Becker! 


The rules defining which small fraction of related DNA sequences 
can be selectively bound by a transcription factor are poorly 
understood. One of the most challenging tasks in DNA recognition 
is posed by dosage compensation systems that require the 
distinction between sex chromosomes and autosomes. In Drosophila 
melanogaster, the male-specific lethal dosage compensation complex 
(MSL-DCC) doubles the level of transcription from the single male 
X chromosome, but the nature of this selectivity is not known’. 
Previous efforts to identify X-chromosome-specific target sequences 
were unsuccessful as the identified MSL recognition elements lacked 
discriminative power”? . Therefore, additional determinants such as 
co-factors, chromatin features, RNA and chromosome conformation 
have been proposed to refine targeting further*. Here, using an 
in vitro genome-wide DNA binding assay, we show that recognition 
of the X chromosome is an intrinsic feature of the MSL-DCC. MSL2, 
the male-specific organizer of the complex, uses two distinct DNA 
interaction surfaces—the CXC and proline/basic-residue-rich 
domains—to identify complex DNA elements on the X chromosome. 
Specificity is provided by the CXC domain, which binds a novel 
motif defined by DNA sequence and shape. This motif characterizes 
a subclass of MSL2-binding sites, which we name PionX (pioneering 
sites on the X) as they appeared early during the recent evolution of 
an X chromosome in D. miranda and are the first chromosomal sites 
to be bound during de novo MSL-DCC assembly. Our data provide 
the first, to our knowledge, documented molecular mechanism 
through which the dosage compensation machinery distinguishes 
the X chromosome from an autosome. They highlight fundamental 
principles in the recognition of complex DNA elements by protein 
that will have a strong impact on many aspects of chromosome 
biology. 

Previous work suggested that MSL2 may tether the MSL-DCC to 
DNA and that an intact CXC domain is required for X-chromosome 
discrimination*®. To assess the DNA-binding specificity intrinsic 
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Figure 1 | Genome-wide MSL2 in vitro binding partially recapitulates 
the in vivo pattern. a, Chromosomal distribution of robust in vivo and 
in vitro MSL2 binding peaks, each determined by two independent 
experiments. The DIP-seq profile of heat shock factor (HSF)* and 

the relative size of the chromosomes (genome) serve as references for 
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to MSL2 comprehensively, we surveyed the Drosophila genome for 
MSL2-binding sites in vitro by DNA immunoprecipitation (DIP)”*. 
Recombinant MSL2 was incubated with sheared genomic DNA 
(gDNA) purified from male Drosophila S2 cells. MSL2-bound DNA 
was recovered and sequenced. 

Considering the lack of X-chromosome binding selectivity seen in 
previous in vitro studies, we did not expect to find that MSL2 pref- 
erentially retrieved DNA from distinct genomic loci, with a notable 
enrichment of sequences from the X chromosome (Fig. 1a). On the 
X chromosome, the MSL2 binding pattern was remarkably similar to 
the in vivo pattern that marks the positions of high-affinity binding 
sites (HAS; or chromatin entry sites) of the MSL-DCC (Fig. 1b). A 
total of 57 DIP sites coincided with in vivo HAS, although they show 
different signal intensities (Extended Data Fig. la, b). The results were 
similar if DIP followed by sequencing (DIP-seq) was performed with 
gDNA extracted from female cells or synthesized in vitro by whole- 
genome amplification (excluding the contribution of male-specific 
RNA contaminants or DNA modifications) (Extended Data Fig. 1c, d). 
It therefore appears that recombinant MSL2 has an intrinsic ability to 
enrich X-chromosomal sequences from complex genomic DNA. 

We next assessed the contribution of the three known MSL2 domains 
to DNA binding (Fig. 2a and Extended Data Fig. 2a). Deletion of 
the RING finger domain that mediates MSL2 interaction with 
MSLI (ref. 9) and contains E3 ligase activity’® had no obvious effect 
(Fig. 2b, c). Unexpectedly, however, deletion of a region rich in proline 
and basic amino acid residues (the Pro/Bas domain) that may bind 
RNA" resulted in the complete loss of DNA binding (Fig. 2b). 

Upon deletion or mutation of the CXC domain, binding to a sub- 
set of sites was much reduced (Fig. 2b, c). Statistical analyses revealed 
56 regions that specifically required a functional CXC domain for 
binding. Notably, these ‘CXC-dependent’ sites displayed a higher 
enrichment on the X chromosome (Fig. 2d and Extended Data 
Fig. 2b). A total of 37 sites mapped to MSL2 in vivo peaks (HAS) on 
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uniform distribution. b, Representative profiles of MSL2 chromatin 
immunoprecipitation with sequencing (ChIP-seq) and DIP-seq 
experiments in a 1.3-Mb window on the X chromosome. Red and blue bars 
indicate the positions of robust peaks. Gene models are depicted in grey at 
the bottom. 
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Figure 2 | The CXC domain of MSL2 increases X-chromosomal 
specificity. a, Linear representation of MSL2 domain organization and 
mutant proteins assayed in DIP. Point mutants in the CXC domain 
included a single (R543A) and double (R526A/R543A) mutant version. 

b, Representative DIP-seq profiles of wild-type and mutated MSL2 
proteins in a region around the hiw gene. Red bars indicate HAS. 

c, Clustered heat map of DIP signals in all MSL2 in vitro peak regions. The 
red bar indicates a group of sites that show a prominent loss in signal upon 
CXC mutation. For some proteins, two independent replicates are shown. 
d, X-chromosomal enrichment over autosomes of robust wild-type and 
mutant MSL2 DIP-seq peaks. CXC indicates the peaks that significantly 
(false discovery rate (FDR) < 0.05) lose binding upon CXC depletion or 
mutation. The x-axis labels indicate the total number of peaks for each 
target in brackets. 


the X chromosome, and 2 sites corresponded to rare cases of autoso- 
mal sites that show MSL2 enrichment in vivo (Extended Data Fig. le, 
Extended Data Table 1 and Supplementary Table 1). Our data suggest 
that MSL2 interacts with DNA via two domains, CXC and Pro/Bas, 
and that the CXC domain is the major determinant of the selectivity 
for the X chromosome. While binding-site specificity can be achieved 
by cooperation between different transcription factors”, our finding 
suggests that cooperation between two different DNA-binding surfaces 
within this one protein may also refine its overall binding specificity. 
Sequence analyses within CXC-dependent and CXC-independent 
binding sites for MSL2 yielded two distinct motifs. Whereas the 
CXC-independent binding sites shared low-complexity GA repeats 
(Extended Data Fig. 3a), the CXC-dependent peaks centre around 
a more complex variation of the MSL response element (MRE), 
with a notable 5’ extension (Figs 3a, 4c and Extended Data Fig. 3c). 
Remarkably, this novel motif can predict in vivo MSL2 binding (HAS) 
better than the MRE, as its position weight matrix (PWM) is supe- 
rior in classifying whether MRE hit regions overlap HAS (Fig. 3b and 
Extended Data Fig. 3b, d). Applying low thresholds (q < 0.2) we found 
2,667 instances of this motif throughout the genome (Supplementary 
Table 1), with an approximately twofold enrichment on the X chro- 
mosome. Higher-scoring matches to the consensus sequence tend to 
be more strongly enriched on the X chromosome. For example, the 
34 best matches are 9.8-fold enriched on the X chromosome (Extended 
Data Fig. 3e, f). However, 18 of those instances were not bound in vitro 
by MSL2 in a CXC-dependent manner, indicating that the recognition 
sequence represented by a PWM cannot fully explain this binding mode. 
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Figure 3 | The CXC domain reads out nucleotide sequence and 
additional features. a, Motif discovered by the MEME motif-discovery 
tool in CXC-dependent binding regions (E-value = 3.9 x 1071°8). 

b, Receiver operating characteristic (ROC) curves representing the 
performance of MRE and the new motif PWMs in predicting whether 
genomic MRE instances (35,659) overlap with a HAS (266). Areas under 
the curves (AUCs) are provided in brackets. ¢, A list of oligonucleotides 
used in DIP experiments. Nucleotides highlighted in red are mutations 
introduced based on the predictions of our classification model. d, DIP 
experiments using synthetic DNA representing wild-type or mutated 
binding sites in the genomic context. Results from qPCR amplification 
were normalized for their input and shown as enrichment over an 
unbound fragment. Data are mean + s.e.m. for 3 biological replicates. 

e, Individual feature importance evaluated on sets of CXC-dependent 
motif instances defined by increasing score thresholds. For each feature a 
ROC analysis on CXC-dependent binding was performed. The AUCs of 
all features were scaled from 0 to 100 at each threshold level. Only features 
which ranked at least twice among the top five are reported. Numbers 

of instances (CXC-bound, non-CXC-bound) are provided underneath 
the x-axis tick labels. f, DIP experiments using the wild-type CG1492 
sequence and a mutant in which the DNA roll at position +1 was reduced 
by mutating positions —1 and +2. Results from qPCR amplification were 
normalized for their input and shown as enrichment over an unbound 
fragment. Data are mean + s.e.m. for 4 biological replicates. 


PWMs model the base readout of DNA sequences with the implicit 
assumption that each nucleotide at a given position contributes to 
binding independently of other positions. Physical interactions of 
neighbouring base pairs, however, alter the structural conformation 
of the DNA double helix (often referred to as the DNA shape), which 
may manifest as variations in the minor groove width, roll, helix twist, 
and propeller twist. Many proteins depend on both base identity 
and localized helix shape to recognize their binding site'-'*. Using a 
pentamer-based model built from all-atom Monte Carlo simulations 
of DNA structures!®, we calculated DNA shape parameters at each base 
position of the low-stringency motif hits, with 20-base-pair (bp) exten- 
sions on either side. To complement these position-centred features we 
also calculated regional mono- and dinucleodide frequencies (k-mers) 
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in 4-bp windows along the hit sequences. Principal component analysis 
(PCA) revealed that a combination of DNA shape and k-mer features 
was able to separate the two classes of sequences: those that were bound 
in a CXC-dependent manner (CXC-bound) and those that were not 
bound in a CXC-dependent manner (non-CXC-bound). Sequences 
in the latter group were either not bound at all (2,502) or were bound 
independently of the CXC domain (111) (Extended Data Fig. 4a and 
Supplementary Table 2). This suggested that at least some of the DNA 
features might improve binding prediction. Indeed, classification mod- 
els constructed with our additional feature sets performed much better 
than a PWM-score model in predicting CXC-dependent binding sites 
on all motif hits (Extended Data Fig. 4b). 

Guided by the good performance of our classification model using 
both the PWM-hit-score and k-mer features, we predicted mutations 
that would convert robust CXC-bound sites to non-CXC-bound sites. 
The model suggested that the best discriminating residues would local- 
ize to the 5’ part of the motif and not to the GA-rich region (Fig. 3c). 
To test these predictions, we modified the DIP experiment by mix- 
ing appropriately diluted DNA oligonucleotides, representing either a 
native site or its mutated version, into the genomic DNA. The efficiency 
of DNA retrieval of experimental oligonucleotides and control genomic 
loci was quantified by quantitative PCR (qPCR). The results confirmed 
our predictions (Fig. 3d), leading us to conclude that the main determi- 
nants for CXC-dependent binding reside within the first eight bases at 
the 5’ end of the consensus motif. Notably, this is the part of the motif 
that diverges most from the MRE. 

To achieve a switch in the predicted class from CXC-bound to non- 
bound in the context of the unbalanced data set of low-threshold 
instances (54 bound sites, 2,613 non-bound sites) required at least 
three mutations. This inevitably affected the motif score, making it 
difficult to distinguish the effects of base and shape readout. To reduce 
class imbalance and to evaluate the contribution of shape features to 
CXC-dependent binding of sequences with high similarity to the motif 
consensus, we limited the analysis to fewer sites through the stepwise 
increase of motif score thresholds. Figure 3e reveals the relative success 
of the PWM score compared with a selection of additional features in 
predicting CXC-dependent binding. In a more balanced data set of 
motifs consisting of the 95 best motif hits (29 sequences CXC-bound, 
66 non-bound) the PWM score was no longer a good predictor and 
DNA shape features became increasingly relevant. In particular, ‘roll 
at position +1’ (that is, the roll between the first two base pairs of the 
motif) turned into the best-performing predictor when sequences with 
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PWM scores higher than 21 were considered. We therefore focused on 
the 34 highest-scoring motif sequences (q < 0.05), which are highly 
enriched on the X chromosome; however, only 16 of them are bound in 
a CXC-dependent manner by MSL2. We systematically scanned these 
high-scoring sequences for statistically significant shape differences 
between the CXC-bound and non-bound classes at any nucleotide 
position (Extended Data Fig. 4d). The results confirmed that ‘roll at 
position +1’ was appreciably different between the two classes. To test 
experimentally the importance of this feature we changed the degree 
of roll at position +1 from >4° to < —2° by either replacing cytosine 
at position +1 or the two adenines at positions —1 and +2. These 
mutations led to a clear reduction in MSL2 binding (Fig. 3c, d, f). We 
were also able to convert a sequence that was not efficiently bound 
by MSL2 into one that was by changing the roll at position +1 from 
< —1.9° to >4° (Extended Data Fig. 4c). Adding the DNA shape fea- 
ture ‘roll at position + to our PWM-hit-score classification model 
resulted in substantially improved performance when applied to the 
complete list of 2,667 motif hits (Extended Data Fig. 4b). We therefore 
conclude that the ability of MSL2 to distinguish true binding sites from 
a large collection of irrelevant elements with highly related sequences 
also relies on structural features. 

To investigate further the role of MSL2-binding sites in X chromo- 
some dosage compensation, we first attempted to monitor the interac- 
tions of MSL2 with HAS in vivo, with minimal contributions from other 
DCC subunits. Genetic studies had shown that the assembly ofa mature 
MSL-DCC bound to the non-coding roX RNA in male flies is compro- 
mised by inactivating the RNA helicase maleless (MLE). Under those 
circumstances, the remaining MSL2-MSL1 sub-complex is bound to 
a small subset of HAS!”. We recreated this scenario in S2 cells by using 
RNA interference (RNAi) against mle expression, and found that MSL2 
binding was preferentially retained at HAS, corresponding to CXC- 
dependent binding sites (Extended Data Fig. 5b). The 25 HAS that were 
most resistant to MLE depletion (Fig. 4a) revealed a shared sequence, 
bearing a strong resemblance to the CXC-dependent motif (Fig. 4c). 
By contrast, the 25 sites most sensitive to MLE depletion (bound only 
by the complete DCC) shared a GA motif similar to the one found 
in CXC-independent in vitro binding sites (Extended Data Fig. 5a). 
This suggests that under physiological conditions, the MSL2-MSL1 
sub-complex directly contacts a subset of HAS in a CXC-dependent 
manner in the absence of associated protein and RNA subunits. 

It is possible that these chromosomal interactions represent an inter- 
mediate of MSL-DCC assembly. To test this hypothesis, we initiated 


Figure 4 | The CXC-dependent sites are 
pioneer HAS. a, Representative profiles of 
MSL2 ChIP-seq from S2 cells treated with 
RNAi against GST (control) or mle. Red bars 
indicate binding sites that are maintained in 
the absence of MLE. b, MSL2 signal on 37 HAS 
matching CXC-dependent in vitro binding sites 
(PionX) or 272 non-matching ones (non-PionX) 
during SXL knockdown in Ke cells. Signals 
were averaged across 4 biological replicates and 
normalized to the mean signal at time point 0. 
Curves depict mean and s.e.m. across all sites 
within one class. c, Comparison of motifs 
found in MSL2-bound regions using different 
experimental approaches. See main text for 
details. Shown are the top scoring motifs except 
for ‘all (in vitro) which places second after a 
low-complexity GA-repeat similar to Extended 
Data Fig. 3a. d, Schematic representation of the 
10-bp deletion that generated PionX motifs on 
the D. miranda neo-X chromosome”!. 
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de novo MSL-DCC assembly in female Kc cells by reducing the 
expression of the sex-lethal gene Sx/. The SXL protein prevents MSL2 
expression and thus the dosage compensation program in female cells. 
Upon depletion of SXL (Extended Data Fig. 5c, d), binding of newly 
expressed MSL2 to CXC-dependent HAS was stronger and occurred 
earlier when compared to CXC-independent ones (Fig. 4b). Consistent 
with this finding, hierarchical clustering of MSL2 signals from the 
common set of Kc and S2 peak regions revealed 30 sites that acquire 
strong MSL2 binding ability 3 days after SXL depletion (Extended 
Data Fig. 5d). De novo motif discovery on these sites revealed a con- 
sensus sequence that resembles the one in the CXC-dependent sites 
(Fig. 4c). Our data strongly suggest that those sites identified in vitro 
as CXC-dependent are pioneering binding sites for MSL2 in vivo. We 
therefore refer to them as PionX sites, and to their defining motif as 
the PionX motif. 

The notion that PionX sites are important for dosage com- 
pensation is further supported by evolutionary considerations. 
Drosophila miranda represents a unique system to study how 
newly evolving X chromosomes acquired dosage compensation. 
The D. miranda neo-X chromosome is a sex chromosome that 
began to evolve just 1 million-2 million years ago'’. Owing to the 
relatively short evolutionary time span, the neo-X chromosome still 
retains many autosomal features, but has already acquired partial 
dosage compensation. Recent work has identified the MSL-DCC- 
binding sites on all D. miranda X chromosomes’. De novo motif 
analysis yielded the typical GA-rich MREs for the older, fully com- 
pensated X-chromosomal arms XL and XR. Notably, though, the 
consensus sequence derived from the neo-X chromosome clearly 
resembled the PionX signature’? (Fig. 4c). 

The neo-Y chromosome originated from the fusion of one Miiller-C 
chromosome to the Y chromosome, resulting in evolutionary 
pressure on the second Miiller-C chromosome to become the neo-X 
chromosome. We found the PionX motif (but not the MRE) to be par- 
ticularly enriched on the D. miranda neo-X chromosome but not on the 
related Drosophlia pseodoobscura Miller C autosome, supporting the 
idea that this motif represents a new X-chromosome-specific feature 
(Extended Data Fig. 5e). Careful comparison of neo-X-chromosome 
sequences with the homologous regions in D. pseodoscura revealed that 
the novel MSL-DCC-binding sites were acquired by diverse molecular 
mechanisms, including point mutations and short insertions/deletions 
of precursor sequences”’. About half of them originated from precursor 
sequences contained in a D. miranda-specific helitron transposon”. 
The homologous neo-Y helitron does not contain PionX motifs—only 
precursor sequences in which the 5’ CAC motif and the 3’ GA-rich 
element are separated. On the neo-X chromosome these two parts 
are fused by a 10-bp deletion to form PionX consensus motifs!?”! 
(Fig. 4d). The insertion of a PionX consensus motif derived from the 
D. miranda neo-X chromosome into an autosome of D. melanogaster 
led to strong, ectopic binding of the MSL-DCC. By contrast, the cor- 
responding homologous neo- Y-chromosome sequence, in which the 
5’ and GA sequences are split by a 10-bp insertion, did not recruit the 
complex!*”!. A similar experiment used the strongest DCC-binding 
site from the neo-X chromosome and the corresponding neo-Y- 
chromosomal fragment, showing MSL-DCC recruitment to the former, 
but not to the latter!®. While the neo-Y-chromosomal fragment does 
not contain a PionX motif, the evolved neo-X chromosome contains 
nine of them (Extended Data Fig. 5f). Collectively, these observations 
suggest that PionX motifs play an important role in de novo acquisition 
of dosage compensation. 

In summary, we provide three lines of argument suggesting that 
PionX sites are X-chromosome-specific determinants that function 
early in the series of events that lead to exclusive targeting of the X chro- 
mosome and correct dosage compensation. First, PionX sites are bound 
by an MSL2-MSL1 sub-complex in the absence of all other subunits, 
a state that may reflect an early intermediate of MSL-DCC assembly at 
HAS. Second, PionX sites are the first to be occupied during de novo 
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establishment of dosage compensation. Finally, PionX motifs arose 
during the early phase of neo-X-chromosome evolution in D. miranda. 

A pertinent conceptual advance from our study is the understanding 
that not all HAS contain the same amount of information. The sub- 
set of PionX sites are not necessarily sites of highest MSL2 occupancy 
in vivo (Extended Data Fig. 1b), but contribute an important qualitative 
element of X-chromosomal discrimination. This discrimination is not 
wholly apparent from the consensus motif as it also relies on the shape 
of the DNA at the MSL2-binding site. 

The initial recruitment of MSL2 to PionX sites on the X chromosome 
may trigger the distribution of the complex to nearby non-PionX HAS 
within the chromosomal territory, thereby further amplifying the 
difference in MSL2 occupancy between the X chromosome and the 
autosomes. It is likely that other factors contribute to the stability of 
the targeting system in vivo, such as the cooperativity of MSL2 domains 
within what is presumed to be a dimeric complex”’; the assembly of 
functional complexes within the X-chromosomal territory owing to 
transcription of roX RNA from the X chromosome”®; synergistic inter- 
actions between different MSL-DCC complexes and with the CLAMP 
protein at clustered MREs”4; and a supportive organization of the 


conformation of the X chromosome”’. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized and investigators were not blinded to allocation during 
experiments and outcome assessment. 

Protein purification. MSL2 proteins were expressed in Sf21 cells and purified by 
Flag affinity chromatography as described?. 

Genomic DNA preparation. The pellet from 6 x 107 S2 or Kc cells was suspended 
in 1.2 ml of lysis buffer (10 mM Tris pH 8, 100 mM NaCl, 25 mM EDTA pH 8, 0.5% 
SDS, 0.15 mg ml of proteinase K) and incubated at 56°C overnight. After addition 
of sodium acetate to a final concentration of 0.3 M, the nucleic acids were extracted 
with phenol-chloroform and precipitated with an equal volume of isopropanol at 
—20°C for 1h. Precipitated nucleic acids were centrifuged and washed with 70% 
ethanol. Dried pellets were resuspended in TE buffer and sonicated with Covaris 
AFA S220 (microTUBEs, peak incident power 175 W, duty factor 10%, cycles per 
burst 200, 430s) to generate 200-bp fragments. After RNase digestion (0.1 mgml', 
lhat 37°C), DNA was purified with the GenElute kit (Sigma). Synthetic DNA was 
produced using the Repli-g kit (Qiagen) with 20 ng of gDNA as starting material. 
DIP-seq. DIP-seq experiments were performed as in ref. 7 with few modifica- 
tions. In brief, 400 ng of gDNA was incubated with either 80nM of MSL2-Flag 
or mutated recombinant protein at 26°C for 30 min in 1001] of binding buffer 
(100 mM KCl, 2mM MgCh, 2mM Tris-HCl pH 7.5, 10% glycerol, 101M ZnCly). 
For DIP experiments in the presence of synthetic DNA, 10 pM of the specified 
synthetic DNA was added to the reaction. 10% of the reactions was taken as input 
material and subjected to quantitative PCR and/or deep sequencing. DNA-protein 
complexes were immunoprecipitated using 15 il of Flag bead slurry (M2, Sigma) 
for 15 min at room temperature and washed twice with 100 i] of binding buffer to 
eliminate unbound DNA. After digestion with proteinase K (0.5mg ml ', 1h at 
56°C), DNA was purified with the GenElute kit (Sigma) and subjected to qPCR 
and/or deep sequencing. The DIP experiments in presence of synthetic DNA were 
performed using the deltaRING construct (three different protein preps). 

Cells, RNAi, ChIP-seq. All cells used in this study were authenticated performing 
karyotyping and staining for the MSL-DCC and regularly tested for mycoplasma 
contamination. 

Double-stranded RNAi fragments were generated from PCR products 
obtained using the following oligonucleotides: mle RNAi: 5’-TTAATACG 
ACTCACTATAGGGAGAATGGATATAAAATCTTTTTTGTACCAATTTTG-3; 
5/-TTAATACGACTCACTATAGGGAGAACAGGGCGCATGACTTGCT-3’. Sxl 
RNAi-1: 5’-TAATACGACTCACTATAGGGAGAGATCACAGCCGCTGTCC-3'; 
5!-TAATACGACTCACTATAGGGAGATACCGAAT TAAGAGCAAATAATAA-3’, 
Sxl RNAi-2: 5’-TAATACGACTCACTATAGGGAGACCCTATTCAGAGCCAT 
TGGA-3’; 5’-TAATACGACTCACTATAGGGAGAGT TATGGTACGCGGC 
AGATT-3’. 

The culture of Drosophila male S2 (subclone L2-4, provided by P. Heun), female 
Ke cells and RNAi against mle and Sxl were performed as previously described? 
with modifications. At days 3, 6 and 9 after the initial treatment with dsRNA, Sxl 
RNAi cells were split and either collected for ChIP experiments and western blot 
analyses or treated again with Sxl dsRNA. For S2 cells, ChIP experiments were 
performed using a Covaris AFA S220 (PIP 100 W, DF 20%, 200 CB for 30 min) to 
generate chromatin fragments of sizes averaging 180 bp. For Kc cells, ChIP exper- 
iments were performed as before”® with modifications. In brief, about 4 x 107 cells 
were suspended in ice-cold homogenization buffer and fixed with 1% formalde- 
hyde for 10 min at room temperature. After quenching with 125 mM glycine, the 
cells were collected and washed three times with ice-cold RIPA buffer (1% Triton 
X-100, 0,1% Na deoxycholate, 0,1% SDS, 140mM NaCl, 10mM Tris-HCl ph 8, 
1mM EDTA). Fixed nuclei were sonicated in RIPA buffer with a Covaris sonifier 
(PIP: 140, DF 20%, CB: 200) for 30 min. 

Antibodies. MSL2, MLE and Lamin antibodies were previously described!. The 
SXL antibody was obtained from F. Gebauer. 

Library preparation and sequencing. The Diagenode MicroPlex library kit 
was used to prepare libraries from 1-2 ng of input, DIP or ChIP DNA quantified 
using the Qubit dsDNA HS Assay kit (Life Technologies Q32851). The librar- 
ies were sequenced on a HighSeq 1500 (Illumina) instrument to yield roughly 
15 million-25 million reads of 50-bp single-end sequences per sample. 
Oligonucleotides. Double-stranded synthetic DNA fragments were obtained by 
annealing equimolar concentrations (10|1M) of complementary oligonucleotides. 
All oligonucleotides used in the DIP studies are listed in Extended Data Table 2. 
Data analysis. If not indicated otherwise, data were processed using R 
(http://www.tr-project.org) or Bioconductor (http://www.bioconductor.org) and 
function calls with default parameters. For hierarchical clustering of binding sites 
based on MSL2 signals, we applied the complete method on Euclidean distances. 
Read processing, coverage and normalized coverage. Sequence reads were 
aligned to the D. melanogaster release 6 reference genome using Bowtie”’ ver- 
sion 1.1.1 allowing only for single matches to the reference (parameter —m 1). 
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We extended the matched reads to a total of 200 bp and calculated for each sample 
a per-base genomic coverage vector by cumulating the total spans of all sequenced 
fragments. 

We defined target signal enrichment as the standardized difference between 
normalized immunoprecipitate and corresponding normalized input coverage 


using: 
: ; coverage, 
normalized coverage; = arcsin 7 a 
\) Uj coverage, 


in which i denotes genomic position, and coverage denotes number of fragments 
covering i. 

Peak calling, definition of robust peak sets and chromosomal enrichment. 
Peaks were called using Homer”® findPeaks version 4.7.2 with the parameters: 
style = factor; size = 200; fragLength = 200; inputFragLength = 200 and C=0. All 
peaks were called using the corresponding input samples as controls. We defined 
peaks as robust if the region was called in at least two biologically replicated sam- 
ples. X-chromosomal enrichment describes the ratio of X-chromosomal peak 
density to autosomal peak density, with density being the number of peaks divided 
by length of chromosome(s). 

Definition of CXC-dependent sites (PionX). We first defined a robust set of 
in vitro MSL2-binding regions by combining the peaks called in the two MSL2 
DIP-seq experiments performed on S2 gDNA and the one performed on Kc 
gDNA. We calculated the average signal enrichment over the input for the profiles 
described in Fig. 2c. We then tested for signal differences in samples with intact 
CXC domain against the ones with a deleted or mutated CXC domain using a linear 
model (R package, limma) including the cell type (origin of gDNA) as random 
effect. CXC-dependent sites were defined with an FDR threshold of < 0.05 and 
a fold change <0. 

De novo motif discovery and genome-wide motif searches. We searched for 
enriched motifs in peak regions using MEME”, with the zero or one occurrence 
per sequence (zoops) model, except for searches in D. Miranda. Here we applied 
the any number of repetitions (anr) model, given the extreme amplification of 
motifs in some of the peak regions. Genome-wide searches were performed with 
FIMO” version 4.10.0 and an initial P-value cutoff of 1 x 10~°. 

Definition of HAS and MRE. MSL2 in vivo peaks were called on two published 
high-quality profiles (GEO accession codes GSM929148 and GSM929149). A total 
of 309 overlapping peak regions (304 on the X chromosome, 5 on the autosomes) 
were defined as HAS. MEME-based de novo motif discovery using the zoops model 
yielded a consensus sequence that we refer to as MRE. This MRE closely matches 
the original definition” (Extended Data Fig. 3c). 

Performance comparison of MRE and PionX PWM. On each of the genome- 
wide 36,410 MRE hits we calculated the score of the PionX PWM after extending 
the hit region by 2 bp 5’ to the MRE consensus. We then determined the over- 
lap of the hit regions with the 309 HAS. If more than one MRE hit matched to 
the same HAS, we kept the hit with the highest PionX score. We determined the 
ROC of each PWM by continuous thresholding of the respective scores using the 
match to a HAS as response. The analysis comprises 35,659 instances mapping to 
266 HAS. 

Definition of mle RNAi-resistant sites. Average MSL2 enrichment was calculated 
on all 309 robust MSL2 in vivo binding sites (HAS) in control and MLE RNAi 
ChIP-seq samples (3 biological replicates each). We then tested for difference in 
signals between the two experimental groups using limma (R). We defined the 
most resistant as the 25 sites with the highest moderated t values. Accordingly, the 
25 sites with the lowest t values were defined as most sensitive sites. 

Definition of strong MSL2-binding sites upon Sxl RNAi in Kc cells. Average 
MSL2 enrichment was calculated for all 309 HAS including the 90 robust peak 
regions arising in Kc cells (4 biological replicates for the Sxl RNAi at each time 
point, 2 for the controls). The signals were clustered hierarchically. Two clusters 
with the strongest gain were combined to constitute a set of 30 sites. 

DNA shape calculation and extended feature description. The initial set of 
regions subjected to extended feature analysis was defined by applying low thresh- 
olds (q < 0.2) on FIMO motif searches for the PionX motif. We obtained 2,667 hits 
(Supplementary Table 1), 54 of which were bound in a CXC-dependent manner 
by MSL2, 111 bound in a CXC-independent manner by MSL2 and 2,502 were 
unbound in our in vitro DIP-seq experiments. We refer to unbound instances as 
well as CXC-independently bound ones as non-CXC-bound. 

DNA shape parameters were calculated with the DNAshape program!°. While 
minor groove width and propeller twist refer to specific nucleotide positions, roll 
and helix twist specify structural parameters between adjacent bases. For the sake 
of simplicity, we assigned the values of roll and helical twist to the preceding base 
(‘roll at position 1’ actually specifies the roll between the bases of nucleotides at 
position 1 and 2). 
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The total set of features considered in this study (with the number of variables 
in brackets) comprise: the PWM-hit-score (1), nucleotide composition at each 
position from —20 bp to +20 bp around the motif (244), minor groove width 
from —18 bp to +18 bp around the motif (57), roll from —19 bp to +18 bp around 
the motif (58), helix twist from —19 bp to +18 bp around the motif (58), propeller 
twist from —18 bp to +18 bp around the motif (57), nucleotide frequencies in six 
consecutive 4-bp windows starting at position 1 of the motif (24), dinucleotide 
frequencies in six consecutive 4-bp windows starting at position 1 of the motif (96). 
Minor groove width, roll, twist and propeller twist constitute the shape features 
(230). Mono- and dinucleotide frequencies constitute the k-mer features (120). 
The total number of features was 595. 

Machine learning. Classification models for feature evaluation were built using 
simple logistic classifier*°. ROC curves of the classifiers were based on tenfold 
cross-validation. 

The importance of all features were ranked by measuring their correlation 
(Pearson's) with the class label on the whole set of PionX PWM hits with a q < 0.2. 
Features selected as relevant for and present at CXC-dependent binding of the hiw, 


CG8097 and CG1492 genes were: ‘CA in window 13 “TT at window 2’ and ‘T at 
window 2. Mutations were proposed based on the results of feature selection and 
the presence of respective k-mers in the sites selected for mutation. The proposed 
mutations were evaluated by simple logistic classifier trained using the PWM score 
and k-mers on the full data set. Modified sites were designed to result in the switch 
of the predicted class from CXC-bound to not CXC-bound. 


26. Schauer, T. et al. CAST-ChIP maps cell-type-specific chromatin states in the 
Drosophila central nervous system. Cell Reports 5, 271-282 (2013). 
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28. Heinz, S. et al. Simple combinations of lineage-determining transcription 
factors prime cis-regulatory elements required for macrophage and B cell 
identities. Mol. Cell 38, 576-589 (2010). 
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Acids Res. 37, W202-W208 (2009). 

30. Summer, M., Frank, E. & Hall, M. Speeding Up Logistic Model Tree Induction 
675-683 (Springer, 2005). 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


o 
= §& 
ae Se, oat > 
In vivo in vitro c 
= 
ad fo) 
Cc 
oO +t 
= 
S 
= o) 
c A 
® 
N 
_ 
7) o) 
= 


0246 8 12 


MSL2 enrichment (in vitro) 


$2 Ke WGA genome 


~ 250 - 
c 4 
S 200 éir 
rat 200 90 
S 100 | 
— 2) 
<3 = 150 
® & 150 | 
g . “ g | 
62 g 100 
a g 100 - | 
€ E 30 CL] 2R 
e 1 a0 as fl aL 
S 
x< 
0 0 0 0 
0 
$2 Ke WGA 
3R:10.274.000..10.312.000 3R:21.772.000..21.810.000 
400 400 
300 300 
200 200 
= 100 5 100 
fo} ° 
oS 0 S 0 
< 400 rad 400 
oO oO 
& 300 & 300 
g © 
+= 200 “= 200 
100 100 
0 0 
CXC-bound I CXC-bound | 
OH it 
10.275 K 10.280 K 10.285 K 10.290 K 10.295 K 10.300 K 10.305 K 10.310 K 21.778 K21.780 K 21.785 K 21.790 K 21.795 K 21.800 K 21.805 K 21.810 K 
t i+ ee 
Extended Data Figure 1 | Analysis of in vitro versus in vivo MSL2- set of two biological replicate experiments; Kc cell and whole-genome 
binding sites. a, Venn diagram showing the genome-wide overlap of amplification experiments were performed once. d, Chromosomal 
robust MSL2 in vivo and in vitro DNA binding peaks. b, MSL2 enrichment _ distribution of MSL2 DIP-seq peaks of experiments shown in c. The 
(immunoprecipitate (IP) over input) of all 57 overlapping peaks from relative size of chromosomes and the genome serve as a reference for 
in vitro DIP-seq and in vivo ChIP-seq experiments. The average of two uniform distribution. e, Representative profiles of in vivo MSL2 ChIP-seq 
biological replicates is shown, and the Pearson correlation coefficient and the corresponding chromatin input on chromosome 3R. Red bars 
is indicated. c, X-chromosomal enrichment over autosomes of MSL2 indicate the positions of CXC-dependent in vitro binding sites. Gene 
DIP-seq peaks using genomic DNA from S2 cells, Kc cells or synthetic models are depicted in grey at the bottom. 


gDNA (whole-genome amplified). S2 peaks correspond to an overlapping 
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Extended Data Figure 2 | Analysis of MSL2 mutants in DIP-seq assays. see Supplementary Fig. 1). b, Chromosomal distribution of DIP-seq 
a, Western blots showing input and anti-Flag immunoprecipitated MSL2 peaks obtained with MSL2, MSL2 mutants and HSF' (see Fig. 2d). The 
proteins from a representative DIP experiment (for gel source data chromosomal size distribution (genome) is provided for reference. 
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Extended Data Figure 3 | See next page for caption. 
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Extended Data Figure 3 | Comparison between the CXC-dependent 
motif and the MRE. a, Consensus motif in CXC-independent binding 
regions (present in 164 out of 201 regions; E=2.0 x 10-'"'). b, ROC 
curves representing the PWM performances of MRE and the new motif in 
predicting whether an instance of the new motif (n = 2,651) will overlap 
with HAS (170). AUCs are provided in brackets. As our method slightly 
penalizes the MRE performance estimation (see Methods), this figure 
represents a symmetrical analysis of the new motif hits of Fig. 3b. c, Top, 
motif logos of MRE as reported previously”. Middle, MRE as reported 
in this study (see also Fig. 4c, top). Bottom, PionX motif as reported in 
this study (Fig. 3a). d, ROC curves representing the PWM performance 
comparison analogous to the result presented in Fig. 3b, including the 
MRE as reported previously (labelled MRE 2008), the MRE as reported 


in this study (labelled MRE) and the PionX motif (labelled new motif) 
in classifying MRE instances (35,659) within HAS (266) or not. AUCs 
are provided in brackets. e, Genome-wide search with the PWM of the 
new motif using FIMO. q-value cut-off relation with the total number of 
genomic hits (top), the number of CXC-dependent in vitro binding sites 
(middle) and the X-chromosomal enrichment of motif hits (bottom). 

f, To ensure that the enrichment is not solely due to performing de novo 
motif discovery on mainly X-chromosomal sequences, we performed the 
analysis as presented in e excluding the training regions. We conducted 
the same analysis for the new motif (left) as well as the MRE (right). Top 
panels depict the q-value distribution and the cut-offs used. The total 
numbers of genomic hits are displayed in the centre panels, with the 
corresponding X-chromosomal enrichments displayed at the bottom. 
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Extended Data Figure 4 | Importance of k-mer frequencies and DNA 
shape for CXC-dependent MSL2 in vitro binding. a, PCA on the set of all 
extended features in 2,667 genomic hit regions of the new motif (q < 0.2). 
Scatter plots and corresponding scaled density plots of PC1 versus PC2. 
2,613 sites not bound in vitro ina CXC-dependent manner and 54 bound 
in a CXC-dependent manner are coloured grey and red, respectively. 

b, ROC curves depicting the performance of simple logistic classifiers 

for CXC-dependent binding on 2,667 low-stringency motif hits (q < 0.2; 
54 sites CXC-bound, 2,613 sites non-CXC-bound) based on different 
combinations of motif PWM scores and extended features. AUCs are 
provided in brackets. c, DIP experiments testing the binding affinities of 
DNA oligonucleotides representing two unbound sites (unbound 2 and 3) 


and their respective mutated sites (unbound 2 mut and unbound 3 mut) 
to increase the roll at position +1. Results from qPCR amplification were 
normalized for their input and shown as enrichment over an unbound 
fragment. Data are mean + s.e.m for 4 biological replicates. d, DNA shape 
features at each base position comparing CXC-bound motifs (n = 16) 

to non-CXC-bound ones (n = 18) in the highest-scoring hit regions of 
the new motif (q < 0.05). Differences of shape features at all positions 
were evaluated by applying Wilcoxon exact rank tests with two-sided 
alternatives. Only roll at position +1 had P< 0.001. As roll and helix 
twist specify inter-base structural features, the corresponding bar graph 
representations have been centred between the respective nucleotide 
positions. 
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Extended Data Figure 5 | In vivo analysis of PionX sites. a, Consensus 

motif found in the 25 regions where MSL2 binding is most sensitive to 

depletion of MLE. b, MSL2 signal changes on 37 HAS matching 

CXC-dependent in vitro binding sites or 272 non-matching ones during 

MLE knockdown in S82 cells. Displayed are the mean differences of three 

biological replicates. c, Western blot analysis of whole-cell extracts from 

S2 and Ke cells treated with either RNAi against Sxl (two different double- 

stranded RNAs) or control RNAi directed against irrelevant Gfp sequences 

at different time points (for gel source data see Supplementary Fig. 1). 

d, Clustered heat map of MSL2 peaks from ChIP-seq experiments 

in female Kc cells treated with RNAi against Sx/ for 3, 6 and 9 days. 


Red bar indicates 30 sites characterized by strong MSL2 recruitment. 

e, Enrichment of PionX motif hits (score > 22) and MRE motif hits 

(score > 27) on D. miranda and D. pseudoobscura chromosomes relative to 
Miller-B, normalized for chromosome length. The analysis included 225 
and 400 PionX hits in D. miranda and D. pseudoobscura, respectively. 

A total of 784 and 755 MRE hits were considered in D. miranda and 

D. pseudoobscura, respectively. f, Sequence from the neo-X chromosome 
chromatin entry sites compared to its counterpart on the neo-Y 
chromosome as in supplementary fig. 2 of ref. 19. Motifs are highlighted in 
green (neo-Y-chromosomal) and in red (neo-X-chromosomal) with their 
corresponding PionX motif score in blue. 
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Extended Data Table 1 | CXC-dependent sites (PionX) 
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Extended Data Table 2 | List of oligonucleotides used in the DIP experiments 


Fi ccc 
AGAACACGGTTTAGAAAGAGATAGTATTAC 

AC 
ae: AAGAATACG AGAAAGAGATAGTATTAC 


ST REL OEE oe Gh Seka CGGTTTAGAAAG 
ST REL OEE oe Gh Seka 


CG8097 wt GAAAACACGAATAAGAAAGAGATGCAAAAC 


CG8097 mut AGAAAGAGATGCAAAAC 


CG8097 GAAAAAACGAATAAGAAAGAGATGCAAAAGC 
Roll mut ATG 
CG1492 wt TTCACACGAATTCGAAAGAGATGGAAATA 
TGC 
CG1492 mut TTCATACGAACGCGAAAGAGATGGAAATA 
TGC 
CG1492 TTCAAACGAATTCGAAAGAGATGGAAATA 
roll mut TGC 
CG1492 TTCGCTCGAATTCGAAAGAGATGGAAATA 
Roll mut2 TGC 
ATAAAATGAAAAAGAAAAAGAAAAAGAAAC 
AC 
TGCACTCGAATAAGAGAGAGAGAGAGCC 
ACCT 
TGCACACGAATAAGAGAGAGAGAGAGCC 
ACCT 
=— | ACAGAAAACGAAAGAGAAAGAGATAGCGTTA 
G 
oe ACAGAACACGAAAGAGAAAGAGATAGCGTT 
AG 


Adapters are highlighted in yellow, mutations in blue. 
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Insights from biochemical reconstitution into the 
architecture of human kinetochores 


John R. Weir!*, Alex C. Faesen!*, Kerstin Klare!*, Arsen Petrovic!, Federica Basilico!, Josef Fischbéck’, Satyakrishna Pentakotal, 
Jenny Keller', Marion E. Pesenti!, Dongqing Pan!, Doro Vogt!, Sabine Wohlgemuth!, Franz Herzog? & Andrea Musacchio!? 


Chromosomes are carriers of genetic material and their accurate 
transfer from a mother cell to its two daughters during cell division 
is of paramount importance for life. Kinetochores are crucial for 
this process, as they connect chromosomes with microtubules in 
the mitotic spindle’. Kinetochores are multi-subunit complexes that 
assemble on specialized chromatin domains, the centromeres, that 
are able to enrich nucleosomes containing the histone H3 variant 
centromeric protein A (CENP-A)*. A group of several additional 
CENPs, collectively known as constitutive centromere associated 
network (CCAN)*~, establish the inner kinetochore, whereas 
a ten-subunit assembly known as the KMN network creates a 
microtubule-binding site in the outer kinetochore”*. Interactions 
between CENP-A and two CCAN subunits, CENP-C and 
CENP-N, have been previously described®-'’, but a comprehensive 
understanding of CCAN organization and of how it contributes 
to the selective recognition of CENP-A has been missing. Here we 
use biochemical reconstitution to unveil fundamental principles of 
kinetochore organization and function. We show that cooperative 
interactions of a seven-subunit CCAN subcomplex, the CHIKMLN 
complex, determine binding selectivity for CENP-A over H3- 
nucleosomes. The CENP-A:CHIKMLN complex binds directly to 
the KMN network, resulting in a 21-subunit complex that forms 
a minimal high-affinity linkage between CENP-A nucleosomes 
and microtubules in vitro. This structural module is related to 
fungal point kinetochores, which bind a single microtubule. Its 
convolution with multiple CENP-A proteins may give rise to the 
regional kinetochores of higher eukaryotes, which bind multiple 
microtubules. Biochemical reconstitution paves the way for 
mechanistic and quantitative analyses of kinetochores. 
Kinetochores are one of the largest and most functionally intricate 
molecular machines of eukaryotic cells. As many as 100 or more proteins 
reside at human mitotic kinetochores, a fraction of them are core 
structural components, while others play accessory regulatory roles. 
Kinetochores perform two related and essential functions. First, they 
bind to spindle microtubules to promote the bi-orientation of the sister 
chromatids in mitosis and meiosis II, and of their homologues in meiosis 
I. Second, they control the spindle assembly checkpoint, a cell cycle 
checkpoint that prevents chromosome segregation before completion 
of bi-orientation, thus ensuring genome stability during cell division’. 
The centromere is the genetic locus, unique to each chromosome, 
upon which the kinetochore is established” (Fig. 1a). In most 
eukaryotes, maintenance of centromere identity does not require 
specific DNA sequences, but relies on epigenetic mechanisms. Crucial 
for this process is the deposition of new CENP-A at mitotic exit, which 
repopulates the CENP-A pool after its equal partition to the sister 
chromatids during DNA replication. Part of the machinery involved 
in this reaction has been identified and a molecular understanding 
of this process is emerging”'*. Among other requirements, the 


recognition of CENP-A by other kinetochore proteins appears to play 
a fundamental role in the epigenetic specification of centromeres”"*. 
Previous studies established that CENP-C and CENP-N recognize 
the divergent C-terminal tail and the CENP-A-targeting domain 
(CATD) of CENP-A, respectively!°"! (Extended Data Fig. 1). Whether 
CENP-C and CENP-N act in a complex on single or distinct CENP-A 
nucleosomes, and whether other CCAN subunits contribute to their 
interactions with CENP-A, is not known. To shed light on this problem 
and on additional functional and structural aspects of kinetochore 
function, we embarked on a biochemical reconstitution of human 
kinetochores in vitro with purified components (Fig. 1b and Extended 
Data Fig. 1; for gel source data, see Supplementary Fig. 1). Here, we 
report the main conclusions emerging from this effort. 

Similarly to their Saccharomyces cerevisiae homologues’*, human 
CENP-L and CENP-N formed a stoichiometric complex (Extended 
Data Fig. 2a). After solid-phase immobilization, CENP-LN interacted 
preferentially with octameric CENP-A nucleosome core particles 
(CENP- AN?) compared with H3 nucleosome core particles (H3NC?) 
(Fig. 1c). Analogous results were obtained with electrophoretic 
mobility shift assays (EMSA; Extended Data Fig. 3a). In size-exclusion 
chromatography (SEC) experiments, which separate macromolecules 
on the basis of their size and shape, a recombinant construct 
encompassing residues 1-544 of CENP-C (CENP-C!>*“4, which 
embeds the first nucleosome-binding motif of CENP-C; Extended Data 
Figs 1 and 2b), co-eluted with CENP-AN® but not with H3X°? (both 
reconstituted on a 145-bp fragment of ‘601’ DNA (ref. 16)) (Fig. 1d 
and Extended Data Fig. 2e, f). SEC also demonstrated that CENP-LN 
binds directly to CENP-C!~*“4 (Fig. Le, light blue trace), in line with 
two recent studies!”!8, and directly, but more weakly, to the four- 
subunit CCAN subcomplex CENP-HIKM (ref. 19) (Fig. le, green 
trace). Together with our previous demonstration that CENP-C!*4 
binds CENP-HIKM (ref. 20), these interactions suggest that CENP-LN, 
CENP-HIKM, and CENP-C!-**4 bind in a single complex. SEC 
readily confirmed this hypothesis (Fig. le, orange trace). We refer to 
this seven-subunit complex as the ‘CHIKMLN’ complex (note that 
‘C’ implies CENP-C}"*4, not full-length CENP-C). Molecular mass 
estimates obtained by sedimentation velocity analytical ultracentrif- 
ugation (AUC) indicated that there is a single copy of each subunit 
in the HIKM and CHIKMLN complexes (Table 1 and Extended Data 
Fig. 4a, b). 

Both CENP-C’*“* and CENP-LN bind CENP-A. We therefore 
asked whether their presence in the same complex increases the 
binding affinity for CENP-A compared with individual binders. We 
incubated CENP-AN®? with growing concentrations of CENP-C!*“4, 
CENP-HIKM, CENP-LN, or the CHIKMLN complex and determined 
relative binding affinities by EMSA assays (Fig. 2a). These experiments 
revealed that, besides CENP-C and CENP-LN, CENP-HIKM also binds 
CENP-ANC?, However, CENP-HIKM did not bind CENP-AN®? with 
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Figure 1 | Reconstitution of the CHIKMLN complex. a, Layered 
organization of the kinetochore with schematic depiction of subcomplexes. 
Those in coloured boxes were included in the reconstitution. 

b, Coomassie-stained SDS-PAGE of recombinant proteins used in 

this study. c, CENP-LN binds preferentially to CENP-A over H3. GST- 
tagged CENP-L in complex with CENP-N on GSH-sepharose beads was 
combined with either CENP-AN®? or H3N°?. Data from three independent 
experiments were quantified. Shown are mean +s.d. d, CENP-C!*# 

binds preferentially to CENP-AN@ over H3N°?. SEC elution profiles of 


an affinity sufficient for co-elution in SEC experiments (not shown), 
binding with apparently similar affinity to free DNA and to H3N°? 
(Extended Data Fig. 3b). Notably, when bound in a single complex, 
the CCAN subunits showed the highest affinity for CENP-ANCP, 
indicative of cooperative binding (Fig. 2a). CHIKMLN co-eluted with 
CENP-AN® in a single high-molecular-mass complex from an SEC 
column, while a much more modest shift was observed with H3NC?, 
demonstrating that the interaction is selective for CENP-AN? (Fig. 2b). 
Additional binding experiments, shown in Extended Data Fig. 3c-e, 
confirmed the selectivity of the interaction of the CHIKMLN complex 
with CENP- AN, 

Using AUC, we showed that a single CENP-ANC? binds two 
CENP-LN or CHIKMLN complexes (Table 1 and Extended Data 
Fig. 4). Chemical crosslinking coupled with mass spectrometry 
(XL-MS, ref. 21) identified an extensive network of interactions 
of the CHIKMLN subunits with themselves and with CENP-AN@. 
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CENP-C!"*“4 (red trace), CENP-C!-5“4 mixed to CENP-AN© (purple), and 
CENP-C!-*4 mixed to H3N© (grey). Shift in the elution profile indicates 
binding of CENP-C!** to CENP-AN®?. a.u., arbitrary units. e, CENP-LN 
(dark blue trace) forms a complex when mixed with CENP-C!“4 (light 
blue trace) or CENP-HIKM (light green trace). These interactions are 
compatible and lead to formation of a seven-subunit ‘CHIKMLN’ complex 
(orange trace). Elution of individual proteins or complexes is shown in 
Extended Data Fig. 2. 


CENP-C!*“4, which may be largely intrinsically disordered, contacts 
all other subunits, with the exclusion of histone H2A, thus emerging 
as the backbone of CHIKMLN (Fig. 2c and Supplementary Table 1). 
In line with the EMSA assays in Fig. 2a, we found several crosslinks 
between the CENP-HIKM complex and CENP- AN. 

We asked if the network of interactions linking the CHIKMLN 
subunits and the CENP-A nucleosome correlated with localization 
co-dependencies in HeLa cells. Individual depletions of CENP-C, 
CENP-H, CENP-L, or CENP-M by RNA interference (RNAi) led to a 
near-complete disappearance of the other CHIKMLN subunits from 
kinetochores during interphase (Fig. 2d and Extended Data Fig. 5a—c). 
Significant levels of CENP-A were present on kinetochores at the time 
of fixation for indirect immunofluorescence, suggesting that the loss 
of CHIKMLN subunits is not caused by complete co-depletion of 
CENP-A, but rather by the co-dependency of CHIKMLN subunits for 
stable kinetochore recruitment. Collectively, our observations indicate 
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Table 1 | Sedimentation velocity AUC of selected kinetochore complexes. 


Predicted Observed Frictional Sedimentation Predicted 
Experiment Complex mass (kDa) mass (kDa) ratio Condition coefficient (S) stoichiometry 
CENP-HIKM 160 145 1.6 A 4.2 aie 
2 CENP:CHIKMLN* 301 266 17 A 6.2 ep Fe bet Bl 
3 CENP-ANCP 200 206 14 B 10.7 octamer 
4 CENP-LN + CENP-ANCP 357 360 1.76 B 10.9 2 CENP-LN:1 octamer 
5 CENP-ANCP 200 ND ND Cc 7.1 ND 
6 CENP-LN + CENP-ANCP 357 ND ND Cc 79 ND 
7 CENP:CHIKMLN+ + CENP-ANCP 760 ND ND Cc 16.2 ND 
8 Chimeric BFP-tagged H3/CENP-ANCP 252 293 13 Cc 12.1 octamer 
9 CENP:CHIKMLNt{ 812 799 1.35 Cc 18.4 2 CENP:CHIKMLN:1 octamer 


+ Chimerict BFP-tagged H3/CENP-ANCP 


ND, not determined. Condition A: 20 mM HEPES pH 7.6, 300 mM NaCl, 2.5% (v/v) glycerol, 1 mM TCEP. Buffer density 1.01365 g ml-! and viscosity 1.307 centipoise (cP). Run was performed at 10°C. 
Condition B: 20mM HEPES pH 7.5, 10% glycerol, 150 mM NaCl, 1 mM EDTA, 2mM TCEP. Buffer density 1.03503 g ml-! and viscosity of 1.002 cP. Runs were performed at 20°C. Condition C: 20mM 
Tris pH 8.5, 150mM NaCl, 2.5% (v/v) glycerol, 1 mM TCEP. Buffer density 1.01365 g ml-! and viscosity 1.307 cP. Run was performed at 10°C. CENP-HIKM = CENP-HI57-75KM; CENP-CHIKMLN* = 


CENP-C1544H 157-756 MLN; CENP-CHIKMLN¢ = CENP-C289-544} 157-756 MLN. 


4The chimaera includes an N-terminal histidine tag, followed by the N-terminal region of H3.1, followed by the C-terminal region of CENP-A: 6His-H3.1(Ala2-lle75)/CENP-A(Cys75-Gly140). 


that the stability of the CHIKMLN complex, as well as its binding 
affinity and selectivity for the CENP-AN“, arise from the reciprocal 
interactions of its subunits. 

CENP-T and CENP-W, two additional CCAN subunits (Fig. 1a), 
contain histone-fold domains and form a tight dimer that further 


associates with a dimer of two additional histone-fold-domain proteins, 
the CENP-SX complex®. The CENP-TWSX complex has been 
proposed to form a CENP-C-independent (but CENP-A-dependent) 
axis of kinetochore assembly”’, but other reports have suggested that 
kinetochore localization of the CENP-TWSX complex depends on 
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Figure 2 | Selective cooperative binding of CHIKMLN to CENP-A 
mononucleosomes. a, EMSA assays with the binding species indicated. 
Shown is mean + s.d. from three independent experiments. b, CHIKMLN 
complex (red trace) binds to CENP-A (green trace) but not H3 (blue 
trace). The composition of the CENP-AN°?:CHIKMLN complex was 
confirmed by the western blotting of several subunits. c, Topology of 

the CENP-AN°?:CHIKMLN complex. XL-MS revealed a network of 
crosslinked peptides. Intra-protein crosslinks are shown as red lines, 
intra-subcomplex crosslinks are shown as orange lines, inter-subcomplex 
crosslinks are shown as black lines. Proteins are coloured according 

to subcomplex (CENP-C, red; CENP-LN, blue; CENP-HIKM, green; 


CENP-ANC?, purple). d, Cooperative localization of CHIKMLN to 
kinetochores. Quantification of kinetochore levels of CENP-C (red 

bars), CENP-HK (green bars), and CENP-A (purple bars) measured by 
immunofluorescence (IF) in control HeLa cells (left) or after depletion 

of the indicated CCAN subunits by short interfering RNAs (siRNAs) 

(see Methods). Localizations of CENP-C and CENP-HK are significantly 
perturbed, despite relatively high residual CENP-A levels. Graphs and 
bars indicate mean + s.e.m. of 2 or 3 independent experiments quantifying 
between 514 and 1,249 kinetochores in 13-25 cells. Representative 
immunofluorescence images are shown in Extended Data Fig. 5. 
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Figure 3 | KMN and CHIKMLIN connect CENP-A to microtubules. 

a, The CHIKMLN:CENP-AN© complex (green trace) and the KMN 
(yellow trace) were mixed, run on a Superose 6 SEC column (black trace), 
and analysed by SDS-PAGE. All 17 components of the kinetochore 

and the 4 subunits of the CENP-AN® shift together. Analysis of peak 
fraction (boxed and marked ‘A’) by Coomassie staining and western 

blot demonstrates co-elution of all subunits. b, Microtubule binding 
assay. Rhodamine-labelled microtubules (red channel) were tethered 


CENP-C*!®. In agreement with the latter, recruitment of CENP-TWSX 
complex to CENP-AN®? requires both CENP-C!*“4 and the CENP- 
HIKM complex (Extended Data Fig. 6). 

The KMN network is made of three subcomplexes, the KNL1 
complex (KNL1-C), the MIS12 complex (MIS12-C), and the NDC80 
complex (NDC80-C). It forms the outer kinetochore and binds 
microtubules”® (Fig. 1a). After characterizing CHIKMLN as the CENP- 
ANCP_associated complex, we asked if it was also competent to recruit 
the outer kinetochore components. The CENP-AN®?: CHIKMLN 
complex readily bound to a reconstituted 10-subunit KMN network 
complex and all components co-eluted from a SEC column, forming a 
17-subunit kinetochore complex bound to an octameric CENP-AN@? 
(Fig. 3a, black trace). XL-MS of this complex revealed an extensive 
network of interactions around a hub represented by the MIS12 
complex (Extended Data Fig. 7 and Supplementary Table 2). The MIS12 
complex formed crosslinks to several outer-kinetochore subunits, 
as well as to inner-kinetochore subunits, including CENP-C!-*44, 
CENP-K, and CENP-N. 

We asked if reconstituted kinetochore particles could translocate 
centromeric chromatin onto microtubules. After immobilizing 
microtubules on a coverslip, we tested the ability of fluorescently 
labelled kinetochore components to interact with them (Fig. 3b and 
Extended Data Fig. 8a, b). The KMN network bound microtubules in 
the absence of other components, fitting with its well-established role 
as a microtubule receptor at the kinetochore”*”’. Neither CHIKMLN 
nor CENP-AN® either together or in isolation, decorated microtu- 
bules in the absence of the KMN network (Fig. 3b). Conversely, when 
the KMN network was added alongside the CHIKMLN complex and 
CENP-AN®? they strongly decorated microtubules (Fig. 3b), 
demonstrating that the KMN network and the CHIKMLN complex 
create a direct bridge between centromeres and microtubules. Only 
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to glass coverslips and incubated in the presence of green fluorescent 
protein (GFP)-KMN (green), Alexa-405-labelled CHIKMLN (blue), 

or Alexa-647-labelled CENP-AN© (purple), and combinations thereof. 
c, d, Quantification of fluorescence data (see Extended Data Fig. 8 and 
Methods). ‘CA and ‘H3’ indicate CENP-AN? and H3N©?, respectively. 
Shown for each channel is mean + s.e.m. from at least 20 microtubules in 
at least 2 independent experiments. 


weak binding to microtubules was observed when CENP- ANC? was 
replaced with H3NC? (Extended Data Fig. 8a—c). Addition of CHIKMLN 
enhanced KMN binding to microtubules (Fig. 3c), a phenomenon 
whose mechanistic basis will require further investigation. Addition of 
CENP-ANC? but not of H3N°?, further increased the binding affinity 
of the KMN for microtubules (Fig. 3b-d). It is likely that this effect 
reflects multivalency arising from the incorporation of two CHIKMLN 
complexes on a single CENP-AN©? (AUC analysis in Table 1 and 
Extended Data Fig. 4), which probably allows for the binding of two 
KMN assemblies to the same particle. Because CENP-TW recruits 
additional NDC80-C to kinetochores?”?*”°, it may contribute to further 
enhancements of the microtubule-binding capacity. 

CENP-C*™45-%43, like CENP-C!"*“4, contains a nucleosome-binding 
element (the CENP-C motif; Extended Data Fig. 1), reported to 
bind NCPs with a moderate preference for CENP-AN®? over H3NCP 
(ref. 11). In vitro, C-terminal segments of CENP-C interacted with 
CENP- AN? but did not recapitulate any of the interactions with 
CHIKMLN subunits observed with CENP-C!~*“4 (Extended Data 
Fig. 9a, b, summarized in Fig. 4a). The specific topology of the CENP-C 
protein and its interactions are shown in Extended Data Fig. 9c. 

The point kinetochore of Saccharomyces cerevisiae consists of a single 
CENP-A (also known as Cse4) nucleosome associated with proteins 
that are evolutionarily related to the CCAN subunits of humans’? 
(Fig. 4b). Our finding that the human CCAN subunits and the KMN 
network form a single, apparently stoichiometric complex on a CENP- 
AN°? suggests that the human and S. cerevisiae kinetochores form a unit 
of similar architecture. At regional centromeres, which may extend over 
several megabases of genomic DNA, up to 200 CENP-A nucleosomes 
intersperse with conventional H3 nucleosomes at an approximate ratio 
of 1 CENP-A nucleosome over 25 H3 nucleosomes’. We therefore 
propose that kinetochores built on regional centromeres represent the 
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convolution with multiple CENP-A nucleosomes of the structural unit 
identified by our in vitro reconstitution (Fig. 4b). 

Here we have reported the production of entirely synthetic 
kinetochores that specifically bind centromeric chromatin while 
mediating a simultaneous connection to microtubules. We note in 
this context that the reconstitution of functional kinetochore particles 
on an octameric CENP-AN® may set a benchmark to resolve an on- 
going discussion on the actual structure of the CENP-A nucleosome”. 
Our efforts complement previous studies with kinetochore particles 
isolated from S. cerevisiae*®”* or reconstitutions on CENP-A arrays in 
extracts of Xenopus laevis*®. Synthetic kinetochores have the potential 
to drive new inroads into the structural characterization of kinetochore 
architecture, which remains largely unknown. The manipulation of 
recombinant kinetochores in vitro may allow molecular insight into 
crucial kinetochore functions including the regulation of microtubule 
binding, of the spindle assembly checkpoint and of new CENP-A 
deposition. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized. The investigators were not blinded to allocation during 
experiments and outcome assessment. 

Production of recombinant proteins. CENP-LN was produced as a GST fusion 
construct from insect cells using the MultiBac expression system*!. Specifically, 
a coding sequence expressing 3C cleavable GST-tagged CENP-L was sub-cloned 
into MCS2, and the coding sequence of CENP-N was sub-cloned into MCS1 of 
pEL. Bacmid was then produced from EMBacY cells*!, and subsequently used to 
transfect Sf9 cells and produce baculovirus. Baculovirus was amplified through 
three rounds of amplification and used to infect Tnao38 cells*. Cells infected with 
the GST- CENP-L/CENP-N virus were cultured for 72h before harvesting. Cells 
were washed and resuspended in lysis buffer (50 mM Na-HEPES, 300 mM NaCl, 
10% glycerol, 4mM 2-mercaptoethanol, 1 mM MgCl pH 7.5). Resuspended cells 
were lysed by sonication in the presence of Benzonase before clearance at 100,000g 
at 4°C for 1h. Cleared lysate was passed over GSH-Sepharose, before extensive 
washing with lysis buffer. GST-CENP-L/CENP-N complex was then eluted in lysis 
buffer + 20mM reduced glutathione. Eluted protein was concentrated in a 30kDa 
Amicon-Ultra-15 Centrifugal Filter (Millipore) in the presence of GST-tagged 3C 
protease. Concentrated protein was then loaded onto a Superdex 200 16/600 column 
equilibrated in 20 mM Na- HEPES pH 7.5, 300 mM NaCl, 2.5% glycerol. A 5 ml 
GSH-Sepharose FF column was connected in series after the Superdex 200 column 
to trap GST, un-cut GST-CENP-L/CENP-N and GST-tagged 3C protease. Peak 
fractions corresponding to CENP-L/CENP-N were collected and again concentrated 
in a30kDa MWCO concentrator to approximately 50-100 |1M before being flash 
frozen in liquid N> and stored at —80°C. 

Synthetic, codon-optimized DNA (Geneart), encoding the human CENP- 
Cl344His, CENP-C!8?-*44, or CENP-C*45-°8 was sub-cloned into pFL or pFG 
(containing an N-terminal 3C cleavable GST) vectors, respectively, by restriction 
cloning with the enzymes BamHI and Sall. A non-cleavable histidine tag 
comprising six histidines (His6-tag) was introduced C-terminally of CENP- 
C!>“4His, a tobacco etch virus (TEV) cleavage site was introduced N-terminal 
of CENP-C*5-*43, Tnao38 cells expressing CENP-C!-*“4His, CENP-C!8?-5“4, or 
CENP-C™>-*? were resuspended in lysis buffer (20 mM HEPES pH 7.5, 500 mM 
NaCl, 10% glycerol, 2mM 8-mercaptoethanol) and lysed by sonication before 
centrifugation at 100,000g at 4°C for 1h. The cleared lysates were incubated with 
Ni-NTA Agarose beads (for CENP-C!"*““His), GST-Trap affinity column (GE 
Healthcare, for CENP-C!**-*““) or Glutathione Sepharose 4 Fast Flow beads (for 
CENP-C*5~*4) at 4°C for 2h. After washing with 70 column volumes of lysis 
buffer, CENP-C!"“His was eluted with lysis buffer supplemented with 200 mM 
Imidazole, CENP-C'** > was eluted in lysis buffer supplemented with 30 mM 
reduced glutathione, and CENP-C*#5-° was cleaved off the beads in 16h at 4°C 
by addition of TEV protease. After elution, proteins were diluted in buffer A 
(20mM HEPES pH 7.5, 5% glycerol, 1 mM TCEP, to achieve a final concentration 
of 300 mM NaCl), loaded onto a pre-equilibrated HiTrap Heparin HP column, 
and eluted with a linear gradient of buffer B (20 mM HEPES pH 7.5, 2M NaCl, 
5% glycerol, 1mM TCEP) in a gradient from 300 to 1200 mM NaCl. Fractions 
containing CENP-C!*“His and CENP-C*#>-*? were loaded onto a Superdex 
200 16/60 SEC column pre-equilibrated in SEC buffer (10 mM HEPES pH 7.5, 
300mM NaCl, 2.5% glycerol, 2mM TCEP). For CENP-C!8?-**4, the GST tag 
was cleaved using 3C protease and the protein concentrated in a 10kDa MWCO 
concentrator. The protein was then further purified by SEC as described for the 
other two constructs. SEC fractions containing CENP-C!>“4His, CENP-C!89-*44, 
or CENP-C45~*43 were concentrated, flash-frozen in liquid nitrogen, and stored 
at —80°C. 

NDC80-GFP complexes were constructed with a C-terminal fusion of GFP to 
HEC1. The unlabelled NDC80 complex was constructed with an N-terminal fusion 
of a His6-tag to SPC25. Construct for insect cell expression exploited the MultiBac 
baculovirus expression system*!. Bacmid was then produced from EMBacY cells, 
and subsequently used to transfect Sf9 cells and produce baculovirus. Baculovirus 
was amplified through three rounds of amplification and used to infect Tnao38 
cells. Cells infected with virus expressing untagged NDC80 were cultured for 72h 
before harvesting. Cells were washed and resuspended in lysis buffer (25 mM 
Na-HEPES, 300 mM NaCl, 10% glycerol, 1mM TCEP, 1mM MgCl pH 7.5 
and 1mM PMSF). Resuspended cells were lysed by sonication in the presence 
of Benzonase before clearance at 100,000g at 4°C for 1h. Cleared lysate was 
passed over Ni-Sepharose, before extensive washing with lysis buffer. The Ndc80 
complex was then eluted in lysis buffer + 250 mM imidazole. Eluted protein was 
diluted to 50mM NaCl using buffer A (25mM Na-HEPES, 10% glycerol, 1 mM 
TCEP) and loaded on a ResQ anion-exchange column. The NDC80-GFP was 
eluted using a salt gradient over 30 column volumes to 500mM NaCl using 
buffer B (25mM Na-HEPES, 1,000 mM NaCl, 10% glycerol, 1mM TCEP). The 
eluted protein was concentrated in a 30-kDa Amicon-Ultra-15 Centrifugal Filter 


(Millipore) and the concentrated protein was then loaded onto a Superdex 200 
16/600 column equilibrated in 10 mM Na- HEPES pH 7.5, 150 mM NaCl, 2.5% 
glycerol, pH 7.5. Peak fractions containing the NDC80 complex were collected 
and again concentrated in a 30kDa MWCO concentrator to approximately 10\1M 
before being flash frozen in liquid N> and storage at —80°C. 

Codon-optimized human CENP-I9”-7°° (57-C) was subcloned in a MultiBac 
pFL-derived vector?! with an N-terminal TEV cleavable His6-tag, under the 
control of the polh promoter. A complementary DNA (cDNA) segment encoding 
human CENP-M isoform 1 was subcloned in the second MCS of the same vector, 
under the control of the p10 promoter. Simultaneously, a second pFL-based vector 
was created with untagged CENP-H and CENP-K under the control of the polh 
and p10 promoters, respectively. The CENP-I/M vector was then linearized with 
BstZ171, and the expression region of the CENP-H/K vector was PCR amplified 
with primers designed for sequence and ligation independent cloning (SLIC) of the 
PCR fragment into the linearized CENP-I/M vector. The SLIC reaction was then 
performed to produce a single pFL-based vector with four expression cassettes. 
Constructs were sequence verified. Baculovirus was then produced and amplified 
with three rounds of amplification. 

Expression of CENP-HI°”CKM complex was performed in TnA038 cells, using a 
virus:culture ratio of 1:40. Infected cells were incubated for 72 h at 27°C. Cell pellets 
were harvested, washed in 1x PBS, and finally resuspended in a buffer containing 
50mM HEPES 7.5, 300 mM NaCl, 1mM MgCh, 10% glycerol, 5 mM imidazole, 
2mM (-mercaptoethanol, 0.1mM AEBSF, and 2.5 units per millitre Benzonase 
(EMD/Millipore). Cells were lysed by sonication, and cleared for 1h at 100,000g. 
Cleared cell lysate was then run over a 5 ml Talon superflow column (Clontech) and 
then washed with 50 mM HEPES 7.5, 1 M NaCl, 10% glycerol, 5 mM imidazole, and 
2mM 6-mercaptoethanol. CENP-HI°”*CKM complex was eluted with a gradient 
of 5-300 mM imidazole, and the fractions containing CENP-HI°”CKM pooled, 
and the His tag cleaved overnight at 4°C. CENP-HI°” ©KM in solution was then 
adjusted to a salt concentration of 100mM and a pH of 6.5, before loading on a 
6 ml Resource S ion-exchange column (GE Healthcare), equilibrated in 20 mM 
MES 6.5, 100 mM NaCl, 2mM §$-mercaptoethanol. CENP-HI°” KM was then 
eluted with a gradient of 100-1,000 mM NaCl over 20 column volumes, and peak 
fractions corresponding to CENP-HI°”"CKM were pooled and concentrated in a 
50kDa MW Amicon concentrator (Millipore). CENP-HI°” KM was then loaded 
onto a Superdex 200 16/600 (GE Healthcare) in 20 mM HEPES 7.5, 150mM NaCl, 
2.5% glycerol, 2mM TCEP. The sample was concentrated and flash frozen in liquid 
N) before use. CENP-HI°”CKM complex was labelled using the Alexa Fluor 405 
C5 Maleimide kit (Thermo Fisher Scientific). 

A cDNA segment encoding residues 459-561 (the histone fold, HF) of human 
CENP-T isoform 1, was subcloned in pGEX-6P-2rbs vector as a C-terminal fusion 
to GST, with an intervening 3C protease site. A cDNA segment encoding human 
CENP-W was subcloned in the second cassette of the same vector. Similarly, a 
synthetic cDNA segment encoding human CENP-X isoform 1, codon-optimized 
for expression in bacteria, was subcloned in pGEX-6P-2rbs vector as a C-terminal 
fusion to GST, with an intervening 3C protease site. Also, a cDNA segment 
encoding human CENP-S isoform 1, was subcloned in the second cassette of 
the same vector. Constructs were sequence-verified. The expression and puri- 
fication procedure was the same for CENP-T/CENP-W and CENP-S/CENP-X 
complexes. Escherichia coli BL21 Rosetta cells harbouring vectors expressing GST- 
CENP-T/CENP-W or GST-CENP-X/CENP-S were grown in Terrific Broth at 37°C 
to an absorbance at 600 nm (A¢09 nm) of 0.6—0.8, then 0.3 mM IPTG was added 
and the culture was grown at 20°C overnight. Cell pellets were resuspended in 
lysis buffer (25 mM Tris/HCl pH 7.5, 300 mM NaCl, 10% glycerol, 1mM DTT) 
supplemented with protease inhibitor cocktail (Serva), lysed by sonication, and 
cleared by centrifugation at 48,000g at 4°C for 1 h. The cleared lysate was applied 
to Glutathione Sepharose 4 Fast Flow beads (GE Healthcare) pre-equilibrated in 
lysis buffer, incubated at 4°C for 2h, washed with 70 volumes of lysis buffer and 
subjected to an overnight cleavage reaction with 3C protease. A heparin column 
(GE Healthcare) was pre-equilibrated in a mixture of 85% buffer A (20 mM 
Tris/HCl pH 7.5, 5% glycerol, 1mM DTT) and 15% buffer B (20 mM Tris/HCl 
pH 7.5, 2M NaCl, 5% glycerol, 1mM DTT). The eluate from glutathione beads 
was directly loaded onto the heparin column and eluted with a linear gradient 
of buffer B from 300 to 1,200 mM NaCl in ten bed column volumes. Fractions 
containing CENP-T(HF)/CENP-W or CENP-S/CENP-X were concentrated in 
10-kDa-cut-off Vivaspin concentrators (Sartorius) and loaded onto a Superdex 75 
size-exclusion chromatography (SEC) column (GE Healthcare) pre-equilibrated in 
SEC buffer (20 mM HEPES pH 7.5, 300mM NaCl, 5% glycerol, 1 mM TCEP). SEC 
was performed under isocratic conditions at a flow rate of 0.5 ml/min. Fractions 
containing CENP-T(HF)/CENP-W or CENP-S/CENP-X were concentrated. To 
form the T(HF)WSX complex, T(HF)W was added to SX at a 1.5 molar excess, 
incubated for 1h on ice, and then subjected to separation on a Superdex 200 
size-exclusion column to separate tetrameric T(HF)SX complex from T(HF) 
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W dimers. Fractions containing the tetrameric T(HF) WSX complex were then 
concentrated in a 10-kDa MWCO concentrator to a concentration of 50-250 \.M, 
and flash-frozen. 

H3 containing NCPs. Plasmids for the production of X. laevis H2A, H2B, 
H3 and H4 histones were a gift from D. Rhodes. X. laevis histone expression 
and purification, refolding of histone octamers or H2A:H2B dimers, and 
reconstitution of H3 containing mononucleosomes were performed precisely as 
described*’. Plasmids for the production of the ‘601’ 145-bp DNA were a gift from 
C. A. Davey. DNA production was performed as described*? with no modifications. 
For Alexa-647-labelled nucleosomes, the 145-bp DNA fragments (601-Widom) 
were amplified using fluorescently labelled primers (Sigma-Aldrich). Biotinylated 
nucleosomes were reconstituted using commercial synthetic 145-bp DNA 
fragments (601-Widom) (Epicypher). 

CENP-A containing NCPs. Plasmids for the production of human CENP- 
A:H4 histone tetramer were a gift of A. E. Straight. Preparations of CENP-A- 
containing NCPs were performed precisely as described*4. For Alexa-647-labelled 
nucleosomes, the 145-bp DNA fragments (601-Widom) were amplified using 
fluorescently labelled primers (Sigma-Aldrich, St. Louis, Missouri, USA). 
Biotinylated nucleosomes were reconstituted using commercial synthetic 145- 
bp DNA fragments (601-Widom) (Epicypher, Durham, North Carolina, USA). 
H3.1/CENP-A-chimaera with H2B-BFP histone octamer. Polycistronic- 
coexpression plasmid pETDuet-6HisH3.1/CENP-A-H4-6His-H2A-H2B-BFP 
was generated on the basis of the strategy described previously** with human 
histone sequences. The coding sequences of the open reading frames of 6His- 
H3.1(Ala2-Ile75)/CENP-A(Cys75-Gly140), H4, 6His-H2A1B, and H2B1J-TagBFP 
were sub-cloned between Ncol and Xhol sites of pETDuet-1 using conventional 
cloning techniques and the Gibson cloning**. The H3 and CENP-A segments of 
the chimaera paste within the o1-helix in a structurally seamless manner. One 
ribosome-binding site was placed upstream of each open reading frame of these 
four recombinant histones. A TEV protease site was placed between 6His-tag and 
H3.1/CENP-A-chimaera and a PreScission protease site was placed between 6His- 
tag and H2A1B to allow tag-removal during protein purification. 

Protein expression and purification of BFP-labelled H3.1/CENP-A-chimaera 
histone octamer followed a previous study*> with minor modifications. 
Purification of the octamer was done according to the previous study*° with minor 
modifications. After Ni-affinity purification, the octamers were incubated for 15h 
at 4°C with His-TEV protease and His-PreScission protease in buffer A containing 
20mM Tris-HCl pH 8.0, 1.0M sodium chloride, 1 mM tris(2-carboxyethyl)- 
phosphine (TCEP). The tag-removed octamers were concentrated in buffer B 
(20 mM Tris-HCl pH 8.0, 2.0 M sodium chloride, 1 mM TCEP) and further purified 
using Superdex 200 10/300 GL gel-filtration column (GE Healthcare) equilibrated 
with buffer B. Fractions containing the octamers were pooled, concentrated and 
stored at —80°C until used for nucleosome reconstitution. 

Analytical SEC analysis. Analytical SEC was performed on a custom-made 
Superose 6 5/200 in a buffer containing 20 mM HEPES, 300mM NaCl, 2.5% glycerol, 
2mM TCEP, pH 7.5 on an AKTAmicro system. As indicated, the following addi- 
tional columns were used: Superdex 200 5/150 Increase and Superose 6 5/150. All 
samples were eluted under isocratic conditions at 4°C in SEC buffer (20 mM HEPES 
pH7.5, 300mM NaCl, 2.5% glycerol, 2mM TCEP) at a flow rate of 0.2 ml/min. 
Elution of proteins was monitored at 280 nm. Fractions (100 11) were collected and 
analysed by SDS-PAGE and Coomassie blue staining. To detect the formation ofa 
complex, proteins were mixed at the indicated concentrations in 501, incubated 
for at least 2h on ice and then subjected to SEC. 

Kinetochore-microtubule binding assay. Coverslips and glass slides were cleaned 
by sonication in isopropanol and 1 M KOH or 1% Hellmanex and 70% ethanol, 
respectively. After functionalization of coverslips with 5% biotinylated poly- 
L-lysine- PEG for 30 min, flow cells were created with a volume of 10-1511. Flow 
cells were passivated with 1% pluronic F-127 for 1h and coated with avidin for 
30-45 min. After incubation with 10nM microtubules (10% biotinylated, 10% 
rhodamine labelled, Cytoskeleton, polymerized according to the manufacturer's 
instructions) for 10-20 min, proteins (400 nM) were added in 80 mM Pipes 
(pH6.8), 125mM KCl, 1mM EGTA, 1mM MgCl, and 201M Taxol). Flow cells 
were sealed with wax and imaged with spinning disk confocal microscopy on 
a 3i Marianas system (Intelligent Imaging Innovations, Géttingen, Germany) 
equipped with Axio Observer Z1 microscope (Zeiss, Oberkochen, Germany), 
a CSU-X1 confocal scanner unit (Yokogawa Electric Corporation, Tokyo, Japan), 
Plan-Apochromat 100 x/1.4 numerical aperture DIC oil objective (Zeiss), Orca 
Flash 4.0sCMOS Camera (Hamamatsu, Hamamatsu City, Japan) and controlled by 
Slidebook Software 6.0 (Intelligent Imaging Innovations). Images were acquired as 
z-sections at 0.27 jum and maximal intensity projections were made with Slidebook 
Software 6.0 (Intelligent Imaging Innovations). 

GST pulldown assays. GST pulldown experiments were performed using pre- 
blocked GSH Sepharose beads in pulldown buffer (10 mM HEPES pH 7.5, 200 mM 


LETTER 


NaCl, 0.05% Triton, 2.5% glycerol, 1 mM TCEP). GST-CENP-LN as bait at a 11M 
concentration was incubated with NCPs as prey at a 3j1M concentration. The bait 
was loaded to 12:1 preblocked beads, before the prey was added. At the same 
time, 1 jg of each protein was added into Laemmli sample loading buffer for the 
input gel. The reaction volume was topped up to 4011 with buffer and incubated 
at 4°C for 1 h under gentle rotation. Beads were spun down at 500g for 3 min. The 
supernatant was removed and beads washed twice with 250 1l buffer. Supernatant 
was removed completely, samples boiled in 15,11 Laemmli sample loading buffer 
and run on a 14% SDS-PAGE gel. Bands were visualized with Coomassie brilliant 
blue staining. Preblocking of GSH sepharose beads 750 1] of GSH Sepharose beads 
were washed twice with 1 ml washing buffer (20 mM HEPES pH 7.5, 200 mM 
NaCl) and incubated in 1 ml blocking buffer (20 mM HEPES pH 7.5, 500 mM 
NaCl, 500 j1g/jl BSA) overnight at 4°C rotating. Beads were washed five times 
with 1 ml washing buffer and resuspended in 500 11 washing buffer to have a 50/50 
slurry of beads and buffer. 

RNAi interference. For CENP-C silencing, we used a single siRNA (target 
sequence: 5’-GGAUCAUCUCAGAAUAGAA-3’; obtained from Sigma- 
Aldrich), targeting the coding region of endogenous CENP-C mRNA. For 
an efficient depletion, siRNA for CENP-C was transfected at a concentration 
of 60nM for 72 h. For CENP-M silencing, we used a combination of three 
siRNA duplexes (target sequences: 5‘-ACAAAAGGUCUGUGGCUAA-3’; 
5/-UUAAGCAGCUGGCGUGUUA-3’; 5’-GUGCUGACUCCAUAAACAU-3’; 
purchased from Thermo Scientific, Carlsbad, California, USA) targeting the 
3/-UTR of endogenous CENP-M. CENP-M siRNA duplexes were used at 20nM 
each for 72 h as published*. For CENP-H a single siRNA (target sequence: 
5'-CUAGUGUCUCAUGGAUAA-3’ obtained from Dharmacon) targeting the 
coding region of endogenous CENP-H mRNA was used at 100 nM for 72h. For 
CENP-L a single siRNA (target sequence: 5’‘-UUUAUCAGCCACAAGAUUA-3' 
obtained from Dharmacon) targeting the coding region of endogenous CENP-L 
was used at 100nM for 72h. Transfections of RNAi were performed with HyPerFect 
(QIAGEN) according to the manufacturer’s instructions. Phenotypes were analysed 
96 h after first siRNA addition and protein depletion was monitored by western 
blotting or immunofluorescence. 

Mammalian plasmids. Constructs were created by cDNA subcloning in pcDNA5/ 
FRT/TO-mCherry-IRES vector, a modified version of pcDNA5/FRT/TO vector 
(Invitrogen). pcDNA5/FRT/TO vector (Invitrogen) is a tetracycline-inducible 
expression vector designed for use with the Flp-In T-REx system. It carries a hybrid 
human cytomegalovirus/TetO2 promoter for high-level, tetracycline-regulated 
expression of the target gene. 

Cell culture. Parental Flp-In T-REx HeLa cells used to generate stable doxycycline- 
inducible cell lines were a gift from S. Taylor (University of Manchester, Manchester, 
UK). They were grown at 37°C in the presence of 5% CO, in Dulbecco's 
Modified Eagle’s Medium (DMEM; PAN Biotech) supplemented with 10% TET- 
free Fetal Bovine Serum (Invitrogen) and 2mM 1-glutamine (PAN- Biotech, 
250 g/ml hygromycin (Invitrogen, Carlsbad, California, USA) and 41.g/ml 
blastidicin (Invitrogen, Carlsbad, California, USA). The cell line was regularly 
tested for mycoplasma contamination. 

Immunoblotting. RNAi-depleted cells for various CCAN components were 
harvested by trypsinization and lysed by incubation in lysis buffer (75 mM HEPES 
pH 7.5, 150mM KCl, 1.5mM EGTA, 1.5mM MgCh, 10% glycerol, 0.075% NP-40, 
90 U/ml benzonase (Sigma)), protease inhibitor cocktail (Serva) at 4°C for 15 min 
followed by sonication and centrifugation. Cleared lysate was washed with lysis 
buffer, resuspended in Laemmli sample buffer, boiled, and analysed by western 
blotting using 12% NuPAGE gels (Life Technologies). The following antibodies 
were used: anti- Vinculin (mouse monoclonal, clone hVIN-1; 1:15,000; Sigma- 
Aldrich, V9131), anti-a-tubulin (mouse monoclonal, Sigma-Aldrich T9026), anti- 
CENP-C (rabbit polyclonal antibody SI410 raised against residues 23-410 of human 
CENP-C; 1:1,200; ref. 10), anti- CENP-HK (rabbit polyclonal antibody S10930 
raised against the full length human CENP-HK complex; 1:1,000), anti- CENP-M 
(rabbit polyclonal antibody raised against the full length human CENP-M), anti- 
CENP-L (rabbit polyclonal, Acries antibodies 17007-1-AP). Secondary antibodies 
were affinity-purified anti-mouse (Amersham, part of GE Healthcare), anti-rabbit 
or anti-mouse (Amersham) conjugated to horseradish peroxidase (1:10,000). 
After incubation with ECL western blotting system (GE Healthcare), images were 
acquired with ChemiDocTM MP System (BioRad). Levels were adjusted with 
Image] and Photoshop and images were cropped accordingly. 
Immunofluorescence and quantification. Flp-In T-REx HeLa cells were grown on 
coverslips pre-coated with 0.01% poly-.-lysine (Sigma). Cells were fixed with PBS/ 
PHEM- paraformaldehyde 4% followed by permeabilization with PBS/PHEM- 
Triton 0.5%. The following antibodies were used for immunostaining: CREST/ 
anti-centromere antibodies (human auto-immune serum, 1:100; Antibodies, 
Davis, California), anti-CENP-C (SI410; 1:1,000, or the directly Alexa488 
conjugated form of this antibody 1:400), anti-CENP-A mouse monoclonal (Gene 
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Tex GTX13939, 1:500) anti-CENP-HK (SI0930; 1:800 or the Alexa488 directly 
conjugated form of this antibody 1:800). Rodamine Red-conjugated, DyLight405- 
conjugated secondary antibodies were purchased from Jackson ImmunoResearch 
Laboratories, West Grove, Pennsylvania, USA. Alexa Fluor 647-labelled secondary 
antibodies were from Invitrogen. Coverslips were mounted with Mowiol mounting 
media (Calbiochem). All experiments were imaged under identical conditions at 
room temperature using the spinning disk confocal microscopy of a 3i Marianas 
system (Intelligent Imaging Innovations, Denver, Colorado, USA) equipped with an 
Axio Observer Z1 microscope (Zeiss, Oberkochen, Germany), a CSU-X1 confocal 
scanner unit (Yokogawa Electric Corporation, Tokyo, Japan), Plan-Apochromat 
63x or 100x/1.4 numerical aperture objectives (Zeiss) and Orca Flash 4.0 sCMOS 
Camera (Hamamatsu, Hamamatsu City, Japan) and converted into maximal inten- 
sity projections TIFF files for illustrative purposes. Quantification of kinetochore 
signals was performed on unmodified Z-series images using Imaris 7.3.4 software 
(Bitplane, Zurich, Switzerland). Z-stacks of single cells were processed in Imaris by 
creating an ellipsoid of 0.3 jum width and 1 jum height, which was positioned on the 
CREST signal to cover most of the kinetochore signal in all channels. Four back- 
ground points with equal ellipsoid size and shape were set between kinetochore 
dots. Intensity values of single kinetochores were exported in a Microsoft Excel file 
and the average of the background values was subtracted from every kinetochore 
value. The mean of all kinetochore signals was taken. For each signal, the mean 
of the corrected values in mock-depleted cells was set to 1. All other values in 
perturbation experiments were then normalized to this value to derive the fraction 
of signal for each measured kinetochore protein compared with control cells. 
Chemical crosslinking and mass spectrometry. Cross-linking analysis of 
CENP- AN°?:CHIKLMN:KMN complex or CENP-AN©?:CHIKMNL complex was 
performed with an equimolar mixture of light and heavy-labelled (deuterated) 
bis[sulfosuccinimidyl] suberate (BS3-d0/d12, Creative Molecules). The complex 
was incubated with 0.8 mM BS3 for 30 min at 30°C and the crosslinking reaction 
was quenched by adding ammonium bicarbonate to a final concentration of 
100 mM. Digestion with lysyl enodpeptidase (Wako) was performed at 35°C, 
6 M urea for 2 h (at enzyme-substrate ratio of 1:50 w/w) and was followed by a 
second digestion with trypsin (Promega) at 35°C overnight (also at 1:50 ratio 
w/w). Digestion was stopped by the addition of 1% (v/v) trifluoroacetic acid (TFA). 
Cross-linked peptides were enriched on a Superdex Peptide PC 3.2/30 column 
(300 x 3.2mm) ata flow rate of 25,11 min~! and water/acetonitrile/TFA, 75:25:0.1 
as a mobile phase. Fractions were analysed by liquid chromatography coupled to 
tandem mass spectrometry using a hybrid LTQ Orbitrap Elite (Thermo Scientific) 
instrument. Cross-linked peptides were identified using xQuest11. False discovery 
rates (FDRs) were estimated by using xProphet12 and results were filtered 
according to the following parameters: FDR= 0.05, min delta score = 0.85, MS1 
tolerance window of —4 to 4 ppm, Id-score >22. The crosslinks were visualized 
using the webserver xVis (http://xvis.genzentrum.lmu.de/) (ref. 37). 
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in 10,L volume. Samples were then run on 0.75% agarose gel in 0.5 x TBE at 
4°C. Gels of unlabelled nucleosomes were stained with SYBRGold (Thermo Fisher 
Scientific, Waltham, MA, USA) according to the manufacturer’s instructions. Gels 
were imaged using a TyphoonTrio scanner (GE Healthcare, Chicago, Illinois, USA). 
Quantification was performed using Image], and analysis using Prism (Graphpad, 
La Jolla, California, USA). CENP-A binding data were fitted with a quadratic 
binding equation. For CENP-A binding by CHIKMLN, a Hill equation with Hill 
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Extended Data Figure 1 | See next page for caption. 
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Extended Data Figure 1 | Building blocks of the kinetochore. Schematic 
organization of protein and subcomplexes used in this study, with essential 
structural features. CENP-A is a histone H3 variant. Crucial to its 
function in kinetochore assembly are the so-called CATD box and 

the C-terminal region, which are believed to interact with CENP-N 

and CENP-C, respectively''!°. For our reconstitution studies, we 
reconstituted human CENP-A:H4 tetramers and combined them with 

X. laevis H2A:H2B dimers. Nucleosome core particles containing 

histone H3 were reconstituted with X. laevis H3, H4, H2A, and H2B 

(see Methods). CENP-C can be thought of as a blueprint for kinetochore 
assembly, with binding motifs for outer and inner kinetochore subunits 
ordered from the N to the C terminus. The N-terminal region starts with 
a binding site for the Mis12 complex*®"!, followed by a binding site for the 
CENP-HIKM complex". Two related nucleosome-binding motifs have 
been identified, in the so-called ‘central region’ and ‘CENP-C motif”"’. 
The nucleosome-binding motifs interact with the H2A:H2B dimer and 
with the C-terminal region of CENP-A!". Finally, the dimerization motif 
has a cupin-like fold’”. The C-terminal region also binds to M18BP1 

(refs 43, 44), which is involved in CENP-A deposition. The two subunits 
of the CENP-LN complex have similar size and are structurally related, as 
revealed by the crystal structure of their S. cerevisiae homologues”. 

The four-subunit CENP-HIKM complex contains a tight subcomplex of 
the CENP-H and CENP-K subunits’’. CENP-M is a pseudo-Ras-like small 
GTPase that has lost the ability to bind GTP”. It interacts with CENP-I 


and is required for its stability’®, but no CENP-M orthologue has been 
identified in S. cerevisiae, whereas Ctf3 is the CENP-I orthologue in this 
organism (see Fig. 4a). Structurally, CENP-I may resemble the HEAT- 
repeat a-solenoid structure of Importin-( (ref. 19). The four-subunit 
NDC80 complex is crucial for microtubule-binding by kinetochores”*. 

It is a dumbbell-shaped, elongated protein with large coiled-coil 
domains”*°. Calponin-homology (CH) domains near the N terminus 

of the NDC80 and NUF2 subunits have been implicated in microtubule- 
binding’. The RWD domains of the SPC24 and $PC25 subunits target 
the NDC80 complex to the kinetochore***” through interactions with 
the MIS12 complex. The four-subunit MIS12 complex remains structural 
uncharacterized, except for low-resolution negative-stain electron 
microscopy analyses*”~°°, It is a hub of interactions, including interactions 
with the CENP-C complex (discussed above), the NDC80 complex (also 
discussed above), and the Knl1 subunit of the Knl1 complex*’. The two- 
subunit KNL1 complex plays a crucial role in spindle assembly checkpoint 
signalling*!. The C-terminal region of KNL1, the largest known core 
kinetochore subunits, consists of tandem RWD domains and is sufficient 
with an interaction with the MIS12 complex*””’. A longer region, 
comprising approximately the last 300 residues, is also sufficient for tight 
binding to ZWINT. For our studies, we used a construct encompassing 
residues KNL1°°-*3!! that was endowed with the ability to bind the 
MIS12 complex and ZWINT. 
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Extended Data Figure 2 | SEC analyses. The indicated samples 

(at a concentration of 10 1M, 5 1M for nucleosome core particles) were 
loaded on the indicated SEC column and the resulting elution fractions 
were analysed by SDS-PAGE. a, CENP-LN complex. Note that CENP-L 
and CENP-N migrate identically in these gels because of their almost 
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identical mass. They can be distinguished by selective addition of a tag, 

as shown in Fig. 1c. b, CENP-C!** complex. c, CENP-HIKM. d, KMN 
network. e, CENP-AN® the lower panel is a MidoriGreen-stained agarose 
gel of the same fractions analysed by SDS-PAGE in the upper panel. 

f, H3NC?; bottom panel as ine. 
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Extended Data Figure 3 | Additional CENP-A binding experiments. 
a, EMSAs were used to assess relative binding affinity of H3 or CENP-A 
NCPs to the CENP-LN complex. Quantification of binding data 
predicts the indicated dissociation constants. In quantifications of 

a-c, the mean + s.d. from three independent experiments is shown. b, 
CENP:CHIKMLN was titrated against Alexa-647-labelled CENP-A NCPs 
(purple trace) or H3 NCPs (grey trace) in an EMSA assay. Experimental 
triplicates were performed, and the approximate dissociation constant 
determined. CENP:CHIKMIN binds with approximately sevenfold 
higher affinity to CENP-A NCPs than to H3 NCPs. c, EMSA assays were 
performed using Alexa-647-labelled DNA in free form (black trace), 

in complex with H3 containing octamers (grey trace), or in complex 
with CENP-A containing octamers (purple trace). CENP-HIKM 
complex was titrated against the DNA or NCPs. No binding preference 
emerged. d, Biotinylated CENP-A NCPs were used as bait to pull down 


LETTER 


CENP:CHIKMLN complex. Interactions were then competed for using 

an increasing concentration of free (non-biotylated) CENP-A NCPs (left) 
or H3 NCPs (right). The ratio of CENP-I to CENP-A was plotted in the 
lower graph. Free CENP-A NCPs effectively compete off biotinylated 
CENP-A NCPs from the CHIKMLN complex. Free H3 NCPs are unable 
to do so, even at concentrations 20-fold the biotinylated bait. The assay 
was performed in 200 mM NaCl, and used a shorter construct of CENP-C 
(189-544) owing to the greater stability of this construct at lower salt 
concentrations and to better separation of the CENP-C and CENP-I bands 
on SDS-PAGE for analysis. e, Biotinylated nucleosomes were used as bait 
to pull down CENP:CHIKMLN complex. Pull-downs were performed 

at increasing salt concentrations from 100-300 mM NaCl. CENP-A 
nucleosomes maintained a strong interaction with CENP:CHIKMLN in 
300 mM salt. H3 NCPs lost the interaction with CENP: CHIKMLN at NaCl 
concentrations above 200 mM. 
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Extended Data Figure 4 | Binding assays and analytical 
ultracentrifugation. a, Normalized sedimentation coefficient (c(s)) 
distributions of the respective sedimentation velocity runs. The data were 
collected at 280 nm and the size distribution analysis of the sedimentation 
coefficient was performed with SEDFIT** software using a continuous c(s) 
model. The rotor was spun at 42,000 rpm and equilibrated at 20°C for 1h 
before the start of the run. b, Normalized c(s) distributions of the 
indicated sedimentation velocity runs. The data were collected at 497 nm 
(thus analysing signals from CENP-HI°7°KM complex labelled with 
Alexa Fluor 488) and the size distribution analysis of the sedimentation 
coefficient was performed with SEDFIT using a continuous c(s) model. 
The rotor was spun at 42,000 rpm and equilibrated at 10°C for 2 h before 
the start of the run. We were unable to carry out runs with isolated 
CENP-C!*“4, CENP-C!8-*44, or CENP-LN complex, owing to sample 
instability during the centrifugation experiments. c, Normalized c(s) 
distributions of the indicated sedimentation velocity runs. The data were 
collected at 401 nm to monitor sedimentation of blue fluorescent protein 
(BFP) in chimaeric nucleosomes consisting of residues 2-75 of histone 
H3.1 and residues 75-140 of CENP-A (see Methods). The chimaeric 
nucleosomes were mixed with a threefold excess of CHIKMLN complex 
(containing CENP-C!*?***; that is, a construct devoid of the binding 
domain for the MIS12 complex) to saturate binding. 
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Extended Data Figure 5 | Kinetochore localization studies. 

a, Representative images showing kinetochore levels in interphase cells 
of CENP-A, CENP-C, and CENP-HK (with an antibody raised against 
the CENP-HK complex) in Flp-In T-REx HeLa cells upon RNAi-based 
depletion of the indicated proteins. Kinetochores were visualized with 
anti-CENP-A sera. Scale bar, 10 1m. Magnification 630x.b, Western 
blots documenting protein depletion. RNAi-based depletion of CENP-C 
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appears incomplete by western blotting, whereas it appears to be very 
penetrant in immunofluorescence experiments. We have described this 
phenomenon before°, and found that decreased CENP-C silencing 
correlates with the higher degree of cell confluence (~80%) for the 
relatively large-scale RNAi preparations required for western blotting 
compared with immunofluorescence (where we start with cells at ~30% 
confluence). 
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Extended Data Figure 6 | Incorporation of CENP-TWSx. In an in vitro to the solid phase (PULLDOWN, bottom) were visualized by SDS-PAGE 
pull-down assay, CENP-AN® reconstituted with biotinylated DNA were followed by Coomassie blue staining. Binding of CENP-TWSX tetramer 
incubated with streptavidin-coated beads and the other recombinant (which contains only the histone fold domain of CENP-T) was contingent 
kinetochore species indicated in the INPUT (top) panel of the figure. to binding of CENP-C!"*4 and CENP-HI°” CKM. 

Beads were recovered by centrifugation and washed, and proteins bound 
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Extended Data Figure 7 | Topology of the kinetochore. Using XL-MS, Proteins are coloured according to their subcomplex: CENP-AN@, purple; 
the inter-peptide interactions within the kinetochore sample were CENP-C!*“4, red; CENP-HIKM, green; CENP-LN, blue; MIS12-C, peach; 
analysed. Intra-protein crosslinks are shown in red, intra-subcomplex KNLI-C, orange; NDC80-C, yellow. 


crosslinks are shown in orange, inter-subcomplex crosslinks in black. 
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Extended Data Figure 8 | Microtubule-binding experiments. 

a, Description of reagents and buffer used in experiments in b and in 
Fig. 3b. b, Rhodamine-labelled microtubules (red channel) were tethered 
to glass coverslips and incubated in the presence of GFP-KMN (green), 
Alexa-405-labelled CHIKMLN (blue), or Alexa-647-labelled CENP- 
ANCP or H3NC? (purple), and combinations thereof. Only CENP-AN@®? 
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translocated to microtubules, whereas H3N? did not. Single microtubules 
from these images have already been shown in Fig. 3b. c, Quantification 
of experiments, already shown in Fig. 3b, c. Shown for each channel is 
mean +s.e.m. from at least 20 microtubules in at least 2 independent 
experiments. 
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(red trace), and their combination (green trace) shows a stoichiometric 
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trace). No apparent shift of CENP-C*4>-°49 was observed. c, Summary 
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region of CENP-C (exemplified by CENP-C**) does not bind 

core kinetochore components (this study) but interacts with CENP-A 
loading machinery, including the Mis18 complex, which in turn recruits 
the CENP-A chaperone HJURP**”?. Each half of CENP-C contains a 
nucleosome-binding motif, and has therefore the potential to interact with 
two adjacent nucleosomes. After DNA replication, when CENP-A levels 
are halved, CENP-A is replaced with H3 (H3.3, refs 54, 55). After mitosis, 
the C-terminal region of CENP-C contributes to recruit machinery that 
replaces H3 with CENP-A. 
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CORRECTIONS & AMENDMENTS 


CORRIGENDUM 
doi:10.1038/nature18962 


Corrigendum: Enhancer loops 
appear stable during development 
and are associated with paused 


polymerase 


Yad Ghavi-Helm, Felix A. Klein, Tibor Pakozdi, Lucia Ciglar, 
Daan Noordermeer, Wolfgang Huber & Eileen E. M. Furlong 


Nature 512, 96-100 (2014); doi:10.1038/nature13417 


In Fig. 2d of this Letter, two different representative stage 11 embryos 
were mistakenly shown for the green (scyl) and red (chrb) channels. The 
merge image showed the signal of both genes in the ‘scyl’ embryo. This 
has now been corrected in Fig. 1 of this Corrigendum, which shows the 
expression of both genes in the same embryo. Both genes are highly 
co-expressed, even more so than previously reported. 


d scyl chrb Merge 


Figure 1 | This figure shows the corrected Fig. 2d of the original Letter. 
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CORRECTIONS & AMENDMENTS 


CORRIGENDUM 
doi:10.1038/nature18931 


Corrigendum: Malaria: 
Thermoregulation in a parasite’s 
life cycle 


Jun Fang & Thomas F. McCutchan 


Nature 418, 742 (2002); doi:10.1038/418742a 


In the first paragraph of this Brief Communications, accession numbers 
are provided for four rRNAs. The database that they refer to is incorrect 
(it should be the European Nucleotide Archive (ENA) rather than 
GenBank), the order in which the accessions are indicated is also 
incorrect, and there is an extra ‘3’ in one the of accession numbers. The 
text should say: “...two mature A-type rRNAs (A1, A2; ENA accession 
numbers, AF503871 and AF503870) and two mature S-type rRNAs 
(S1, S2; ENA accession numbers, AF503869 and AF503868)...”. 
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for Hollywood p.257 networking site go.nature.com/2c4wg7v 


For the latest career 
listings and advice www.naturejobs.com 


Palaeontologist David Burnham has landed on some of his best ideas while mindlessly digging for bones. 


Daydream and 
discover 


Tedious daily work might feel frustrating, but idle thoughts 
can drum up just the right spark of scientific inspiration. 


BY EMILY SOHN 


hen biologist Adrian Smith chose to 
Wess: ants, he approached the field 

with ambitious questions and big 
dreams of discovering how animal societies 
work. The reality was much less glamorous. 

To capture ant colonies to study in the lab, 
he digs human-sized holes and then plucks 
out thousands of insects, one by one. After 
six hours or more of this backbreaking work, 
Smith, who works at the North Carolina 
Museum of Natural Sciences in Raleigh, and 
his teammates sometimes discover that the 
queen is missing, or they've inadvertently cut 
her in half. They then have to start over again. 

Such mind-numbing work occupies research- 
ers time in most specialities. By nature, science 
depends on intensive data collection, repetition 
and replication. To cope with the tedium, expe- 
rienced scientists have found tricks for making 
the work more pleasurable, such as getting to 
know colleagues who are in the same trenches 
and keeping long-term goals in mind. 

Given the intense focus required for the bulk 
of their work, many scientists learn to value 
brainless tasks that allow them to zone out and 
indulge in free thinking. That can lead to crea- 
tive research ideas or ways to boost efficiency. 
Faced with day after day of doing the same 
thing, researchers who appreciate boredom 
can gain insight into their goals and priorities. 


SURVIVAL GAME 
Boredom is a typical part of the process of 
scientific discovery, which rarely happens in a 
day. Even when intriguing results emerge, it can 
take many months to write a paper, get it peer 
reviewed and complete multiple revisions, and 
then wait for publication before sharing discov- 
eries with the world. The first step towards cop- 
ing successfully can be simply to accept the drill. 
“We wouldn't be on the edge of discovery if 
it was easy,’ Smith says. “Sometimes, you're in 
places where no one has really looked, or you're 
seeing something no one has seen before. To 
get to that point, it takes some tedious work” 
At such times, it can help to remember that 
the work might eventually bring bursts of exhil- 
aration or, sometimes, a real thrill of discovery, 
says David Burnham, a palaeontologist at the 
University of Kansas Biodiversity Institute & 
Natural History Museum in Lawrence. He regu- 
larly digs for dinosaur bones — an experience 
that resembles a notorious description of war, 
he says — it entails “long periods of boredom, 
interspersed with high excitement’: 
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> Every trip begins months beforehand 
with much tiresome preparation, such as fill- 
ing out forms to get excavation permits. Once 
his team arrives in the field, the group has to 
organize gear, drive on bumpy dirt roads and 
hike to a site where the researchers set up tents 
and equipment. Then comes the slow process of 
digging, often in hot or otherwise uncomfort- 
able conditions. The work starts with shovels 
and picks, but when fossil evidence begins to 
appear, the palaeontologists switch to smaller, 
more delicate tools to unearth what might end 
up being just unidentifiable shards of bone. Any 
finds that could add to the overall puzzle must 
be carefully wrapped and meticulously docu- 
mented before being taken to the lab, where an 
even more delicate process of excavation and 
investigation continues. 

As the hours disappear, Burnham keeps 
in mind the possibility that he might at any 
moment find a motherlode of bones or a fos- 
sil that will change everything. Equally moti- 
vating are sporadic discoveries that shed light 
on big questions. Sometimes, the pay-off can 
be huge. On one memorable dig in China, 
Burnham's team found a raptor that turned out 
to be a new species. During the monotonous 
fact-checking process required to verify the 
find, the team compared the new bones to those 
ofa related raptor and realized that the relative 
had grooved teeth, which suggested that it was 
venomous. That realization led to a paper, pub- 
lished in the Proceedings of the National Acad- 
emy of Sciences’ in 2010, that described the first 
venomous raptor ever known. 

Finds such as those are rewarding enough, 
he says, to confer a surprisingly high tolerance 
for boredom or similar discomfort. “That one 
piece of excitementis so exhilarating,” he says. 
“Tt just gets into your blood and you have to 
keep going” 

Telling others about your grand goals can 
be another way to endure tedium, suggests 


David Hadley, an epidemiologist in Tampa, 
Florida, who does both academic and indus- 
trial research. He is developing a programme 
that would help oncologists to settle on the best 
course of care for patients with cancer on the 
basis of treatment data from previous patients 
and other information such as their ages, gen- 
der and genetic variations. To get it right, he 
has to run a lot of computer simulations and 
then wait as a computer crunches data, some- 
times for up to a week. Often, results reveal 
mistakes that need to be fixed before the next 
simulation can be run. “It really helps to talk to 
other people about the big picture, not neces- 
sarily about what you are doing day to day, but 
about what you are trying to achieve overall,” 
he says. “In my case, it’s trying to help sick kids. 
That is why I’m motivated to do it” 


GRUNT WORK TO GROWTH 

Just as musicians need to learn scales before 
they can improvise, grunt work is a necessary 
step towards designing studies to answer big 
questions, adds William Stoops, a behavioural 
pharmacologist at the University of Kentucky 
in Lexington. He has spent many hours super- 
vising research subjects as they interact with a 
computer to earn doses of addictive drugs, with 
the goal of working out what drives drug use 
and abuse, and finding treatments. “If you can’t 
understand what a subject is supposed to doina 
session, and you design an experiment that’s just 
not feasible, it will fail? he says. “Every graduate 
student and postdoc learns this stuff from the 
ground up.” 

Frequently reminding yourself of the poten- 
tial pay-off can make delayed gratification more 
palatable, Smith says. “It sucks until it doesn’t” 
is a mantra that he repeated to himself on a trip 
this year to Florida, where he spent eight long, 
hot days roaming around forests getting bitten 
by mosquitoes while crawling on his hands and 
knees to look for ants. It was tough going until 


BEATING BOREDOM 


Tunes for tedium 


When work gets boring, many researchers 
tune out by tuning in — enduring repetitive 
work by listening to music or stories. 
Deciding what's best to listen to during 
tedious tasks depends on the kind of work. 
For tasks that require some attention but not 
full focus, behavioural ecologist Margaret 
Couvillon, who will soon teach entomology 

at the Virginia Polytechnic Institute and 

State University in Blacksburg, recommends 
choosing familiar audio books. But to survive 
long stretches of nothing punctuated by busy 
periods, she prefers podcasts that are easy to 
pause and resume. Favourites include: This 
American Life, Invisibilia, Science Friday and 
Wait Wait Don’t Tell Me. 


Adrian Smith, an ant researcher at the 
North Carolina Museum of Natural Sciences 
in Raleigh, North Carolina, likes podcasts, too, 
and recommends ones that match outside 
interests. He enjoys comedy and pop-culture 
themes: Bullseye, WTF, the Memory Palace 
and the Bret Easton Ellis Podcast. 

Music is another option. David Hadley, 
an epidemiologist in Tampa, Florida, likes 
to listen to stuff that’s familiar to him. 

UK neuroscientist Dean Burnett prefers 
classic mainstream pop that drowns out 
distractions without being too stimulating. 
He also points to a common belief that 
video-game theme music is ideal for 
boosting motivation. E.S. 
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Biologist Karen Warkentin counts frog eggs. 


he found what he was looking for: colonies of 
Formica archboldi, a species that preys on other 
ants and litters its nests with their carcasses. He 
wanted to take them to his lab to study their prey 
preferences and possible predatory behaviours. 

It could always be worse, adds neuroscien- 
tist Dean Burnett, who, as a graduate student 
at Cardiff University, UK, watched many rats 
navigate many mazes, tallying which direction 
the rats chose at each turn, to try to understand 
how they retrieved memories. Without a way 
to automate data collection, he would remind 
himself of the glamour of his previous job: 
embalming corpses for a medical school. 

He recommends starting work with your eyes 
open and the expectation that some tasks will be 
less fun than others. “People want to do science, 
and they have big lofty goals,” he says. “A lot is 
day-to-day work. There can’t be many jobs that 
are generally enjoyable all day, every day.” 

To make monotonous work more bearable, 
it can also be useful to schedule repetitive tasks 
to match your own ebbs and flows of energy, 
says Karen Warkentin, an integrative biologist 
at Boston University in Massachusetts. To do 
her work, she has forced herself to stay awake 
many nights in a dark lab, waiting for snakes 
to wake up and eat frog eggs. She has walked 
around a pond counting bundles of dozens of 
eggs, often recounting and recounting. And 
she has measured thousands of frogs as they 
grew from tadpoles to adults, all in the name of 
understanding plasticity in the early-life stages 
of amphibians, among other goals. After han- 
dling frogs all day, she spent evenings plugging 
numbers into spreadsheets and checking col- 
umns — clocking 16-hour work days in what 
she calls a “crazy marathon” ofan experiment. 

She prefers different times for different tasks. 
For her, early morning is usually best for crea- 
tive work, such as writing. When her brain feels 
fried, often after lunch or in the evening, she 
finds satisfaction in repetitive jobs. “You can feel 
like, “Hey, I’m still being productive?” she says. 

Warkentin likes to work in silence, but many 
researchers distract themselves by listening to 
music, podcasts or books on tape (see “Tunes 
for tedium). Tedious times can also be a bond- 
ing experience, Burnham says. While digging 


SEAN MATTSON/SMITHSONIAN TROPICAL RESEARCH INSTITUTE 


for dinosaur bones, his field crew chats and 
jokes around, creating memories and forging 
friendships. “The best way to endure it is to 
put together a field crew of people who are 
like-minded and enthusiastic and really want 
to be there,’ he says. “Then you can sit around 
and have fun.” 


BUILDING WITH BOREDOM 

Boredom isn’t just something to endure: it 
can carry value of its own, giving the brain 
uninhibited space to wander and wonder. 
As a graduate student, Smith had an idea 
while watching ants (Novomessor cockerelli) 
move around in a box for hours: what if he 
reunited a group of isolated worker ants 
with the queen instead of with the rest of 
the colony, as he had done in other experi- 
ments? The results were surprising: the queen 
attacked the main worker and rallied the rest 
of the workers to gang up on it. The discov- 
ery spawned two publications: one in the 
German journal Naturwissenschaften’ in 2011, 
and the other in Animal Behavior’ in 2012. 

Smith also credits boredom for some unex- 
pected twists in his career. During bouts of 
daydreaming and podcast-listening while 
doing menial tasks, he decided to create a 
series of YouTube videos and launch a pod- 
cast, Age of Discovery, in which he interviews 
biologists about their careers. Developing 
those multimedia skills helped him to land 
his current job, which includes outreach and 
communications. “I spent countless hours 
thinking about whether I wanted to commit to 
things that were tangential to my research but 
turned out to not be tangential to my career,’ 
he says. “That stuff wouldn't have happened 
if I was just occupied in front of the computer 
writing all the time or whatever.” 

Boredom can also spark creative ways to 
minimize it. Frustrated with how long it took 
to run computer simulations for his software, 
Hadley more than once boosted his efficiency 
by rewriting programs created by others. “If 
you are only doing something once or twice, 
you can afford to wait a couple of seconds,” he 
says. “When you are doing permutations two 
million times, that’s two million seconds lost. 
Ithelps me reduce my downtime.” 

These kinds of stories are being docu- 
mented in an emerging field of research on 
the value of boredom. In one study’, Jennifer 
Hunter, a PhD student at York University in 
Toronto, and her colleagues found that — after 
accounting for traits such as extroversion — 
people who are prone to boredom also report 
being curious types, adding to growing evi- 
dence that boredom can breed innovation. “I 
think it can bea huge catalyst,” she says. “Don't 
ignore your boredom. It can tell you really 
powerful things about what you're doing” 

Asa career evolves, boredom can become 
a state of comfort. Although Warkentin’s 
frog-counting work might sound tedious, 
she doesn't mind it — instead, she finds it 


satisfying to be in the natural world and enjoy 
serendipitous experiences with wildlife. 
She looks for the same personality fit when 
fielding applicants for her team. “When I’m 
recruiting students,” she says, “I’m like, ‘Does 
this sound like your idea of a good time?” 

The answer might be ‘no, and those 
feelings are worth paying attention to, says 
Margaret Couvillon, a behavioural ecolo- 
gist who recently completed a postdoc at 
the University of Sussex, UK, and will soon 
begin teaching entomology at the Virginia 
Polytechnic Institute and State University in 
Blacksburg. Couvillon started out as a neu- 
robiology PhD student, and found herself 
staring at slices of bird brains. As she slowly 
inserted probes into the tissue to find neurons, 
she became discontented. Her true interest 
was animal behaviour, and she realized that 
she really wanted to watch animals in action, 
not study their brains in the lab. 

When she transferred to an ecology pro- 
gramme elsewhere, she discovered that her 
experiments included plenty of tedious ele- 
ments, too. She has spent “many, many, many 
hours” watching videos of dancing bees (Apis 
mellifera) and timing and measuring their 
movements to determine where they for- 
age. She has also spent a lot of time sitting in 
front of honeybee feeders, counting insects 
that visit and waiting for long stretches when 
none come by. But Couvillon has discovered 
that she’s much happier enduring boring work 
when it addresses the questions that truly 
interest her. And with so much of her time 
taken up by mentally exhausting tasks, she has 
come to cherish the chance to sit by a honey- 
bee feeder ona nice day. She suggests keeping 
expectations realistic — after all, nobody has 
a job that delivers eureka moments every day. 

She also recommends shadowing a vari- 
ety of scientists to see whether the daily real- 
ity seems appealing before committing to a 
speciality. “Not all dirty work is created the 
same,’ Couvillon says. “You have to have an 
everyday life you can handle? = 


Emily Sohn is a freelance journalist in 
Minneapolis, Minnesota. 
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CORRECTION 

The Careers feature ‘Visa to visit’ (Nature 
536, 365-366; 2016) wrongly stated 
that Kelsey Glennon asked students from 
indigenous tribes not to stand so close to 
her. She actually made the request of all 
her students. 
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TRADE TALK 
Star selector 


As an astronomy 
PAD student at 
Harvard University 
in Cambridge, 
Massachusetts, 
Nathan Sanders 
learnt statistical 
modelling to 
7 analyse supernova 

explosions. Now, he 
works for Legendary Entertainment in nearby 
Boston, applying those quantitative skills to 
predict which stars and story lines can make a 
film into a commercial success. 


When did you consider leaving astronomy? 

I had learned a new computational frame- 
work in a statistics course. As I applied those 
techniques for my thesis, I realized that I loved 
what I was doing, and the reason had more to 
do with the statistical models than the astron- 
omy applications. That made me open to new 
opportunities. I thought I would be doing a 
disservice to myself if I didn’t explore them. 


What appealed to you about this position? 
When I was hired in 2013, Legendary had just 
launched its applied-analytics division in Bos- 
ton. It felt like an opportunity to rethink and 
reinvent the way that companies pick which 
films to make and how to build support for 
them. The goal was to be the first in Holly- 
wood to make decisions on the basis of data 
and evidence rather than on intuition. 


Besides technical skills, what do you look for 
in candidates when you recruit? 
Communication is key. You have to be com- 
fortable with diverse concepts, and talk to 
business people, filmmakers and technical 
colleagues. 


How did you hear about this position? 

I emphasize the importance of volunteering 
and getting out into the community. As a first- 
year graduate student, I started a project called 
Astrobites, a collaborative writing project that 
creates a Reader’: Digest version of astronomy 
literature. I also volunteered with an organiza- 
tion doing live science demonstrations. The 
executive director was a friend of the chief 
analytics officer at Legendary Entertainment. 
It was one of those random connections that 
so often creates a job opportunity, but that can 
be hard for scientists to foster if they are com- 
pletely focused on their thesis work. = 


INTERVIEW BY MONYA BAKER. 
This interview has been edited for length and clarity. 
See go.nature.com/2bix4y7 for more. 
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SHE RESETS A DAY AND 
DRIVES SAM HOME INSTEAD. 
NOT ENOUGH. 


“YOU HAD A EA," 
MARIELLE SAYS: 


“T TOTALLY HAD A 
WAIT TILL L bhai OU." 
VA ; 


/ (BEFORE YOU DO, CAN YOU 
REMEMBER WHEN YOU: HAD IT?” 


-¢ +@ 


THE CONVERSATION HAS BEEN 
ONGOING. SHE TRIES A WEEK. 
A MONTH. STILL THE DIAGRAMS ON 
THE TABLE LAYING OUT A PATH TO 
THE END OF EVERYTHING. 


1), SAM CARPOOLS WITH 

fi KRISTIN KANZ, THE 
OFFICE MANAGER. THEY 
SHARE AN INTEREST 
IN POLICE DRAMAS. 

) THEY DO NOT Disclss 

| COMPUTATIONAL PHYSICS. 


/ 
“i 
, 
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ATMA -_ \\| THEN HE PUTS IN A REQUEST 
ee, apo retin yeas ; \\ FOR SUPERCOMPUTER ACCESS | 
iI) AS PART OF HIS PERSONAL 
| PROJECT ALLOCATION. 


‘ x “WE CAN DO IT IN 
Sees phe LJ (0 || JU | ie, I NNO , J. SOFTWARE," HE SAYS, 
ms WU aq 17 With meee SUPELING OVER. SHE FEELS LIKE SHE'S FALLING AGAIN. 


“HOW CAN YOU TEST IF WE’RE LIVING IN A 
SIMULATION USING SOFTWARE?” 


MAYBE IT WILL BE A BLIND ALLEY. BUT HE 
HAS A BRILLIANT IDEA ABOUT USING NESTED 
SIMULATIONS TO PREDICT THE OPERATING 
CONSTRAINTS OF THE MACHINE THE 
UNIVERSE COULD BE RUNNING ON. 


LOM LaPLO1110 
11000111010001 
my Ss oblong” l 1 ) a) ] i all 


THE SMELL 


THE TASTE OF 
SALT ON SAM'S 
SKIN AFTER HE'S § 
BEEN RUNNING. 


SHE NUDGES HIM AWAY FROM 
INVESTIGATING THE GRANULAR 
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_—OnWv® SE 
Y Z 


MB 


oy A 


TO 


aise ¢ 


VN RS VAY 4) 
= A BN ING ee “4, HE'S HARDLY FINISHED THE 
IV AAA AY \A ZrO, z % SENTENCE BEFORE SHE 


"I JUST HAD A CRAZY THOUGHT ABOUT fia RESETS TO FOUR MONTHS 
THE SIMULATION HYPOTHESIS.” EARLIER AND CONVINCES HIM 
Z TO CHOOSE THAILAND. 


AAAA\AANG = 


SHE ENDURES A SECOND ROUND OF DRESS 
FITTINGS ANP RECEPTION PLANNING. 


TWO YEARS MORE SHE HOLDS IT TOGETHER. 


THEY HAD BEEN CHILDLESS IN THEIR OTHER 
LIFE, BUT THIS TIME THERE’S ELLE, AND 


EVERYTHING |S SO DIFFERENT. —— 
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ONE NIGHT, THEY'VE BEEN UP 
FOR HOURS. FINALLY, THEIR 
COLICKY GIRL IS ASLEEP. 


THEY SIT ON THE FLOOR OF 
THE NURSERY, TOYS STREWN 
AROUND THEM, LEGLESS 
WITH FATIGUE. 


“WHOAH,” 

SAM SAYS. “I 
JUST HAD THE 
WILDEST IDEA.” 


“NO.” MARIELLE SNAPS 
AT HIM. “YOU DO NOT 
HAVE ANY IDEAS. YOU 
HEAR ME? NO IDEAS. 
NO EXPERIMENTS.” 


“NOT IF YOU KEEP 
GOING IT WON'T.” 


AFTERWARDS THERE Is A 
STRANGE BRIGHTNESS IN HIS 
EYES THAT SHE MISTAKES 
FOR FEAR. FOR GRIEF. 


8 SEPTEMBER 2016 | VOL 537 | NATURE | 267 


Ua SCIENCE FICTION SCIENCE FICTION qu SS 


ELLE WILL NEED PICKING UP 
is FROM SCHOOL SOON. OR IS THE - 
| || UNIVERSE FROZEN WHEN SHE'S 
— = IN THE GREY ROOM? PAUSED 

WHILE A FINGER HOVERS OVER 
THE DELETE BUTTON. 


\" fA - YY 
LEM Mk Ses 
/ “I CAN DO IT — 1 CAN,” SHE SAYS. 


, “IMANAGED FOR 12 YEARS. DO YOU 
4, KNOW HOW MANY TIMES I RESET?” 


) 
A 
/ A = 
é oa / My 


SEVEN YEARS LATER, ELLE IS AT 
SCHOOL. MARIELLE |S AT_HOME, 
ILL. SHE USES SAM’S COMPUTER 
TO CATCH UP ON WORK ANP AS SHE 
DOES HIS E-MAIL WINDOW POPS UP. 


IT’S AN ACCEPTANCE FOR HIS. PAPER 
ON THE SIMULATION HYPOTHESIS. 


THE EDITOR OF THE JOURNAL |S 


EFFUSIVE. HE COMMENPS-SAM FOR 
THE YEARS OF WORK HE‘RPUT IN. 


YEARS. 


MARIELLE PUTS HER 
HEAD ON THE TABLE. 
SHE SOBS AND SOBS. 


“TOO MANY TIMES.” THE VOICE | AS 
COLOURLESS AS THE ROOM. “YOU COULD NOT 
PREVENT THIS KNOWLEDGE FROM SPREADING 
AS YOU PROMISED. A WIPE IS REQUIRED.” 
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SHERESETSTO | 
& | a MID-PREGNANCY, 
:; / > . 
SAN WERE HIKING. Wy) avavrumn sunpay 
| FINDS THEM AT THE 
TOP OF GROUSE 
MOUNTAIN. 


we OES 


MARIELLE LAGS 
BEHIND. TELLS HIM 
IT'S THE WEIGHT 
OF THE BABY THAT 
SLOWS HER DOWN, & 
RATHER THAN DREAD. 


<<, THEY WALKA ie 
\. NARROW PATH. 


SS 


EVEN KNOWING WHAT 
SHE KNOWS, IT’S 
HARD TO BELIEVE 
EVERYTHING IS CODE. 


ELLE, GROWING 
INSIDE HER. 


THE SUN IN 
THE BLUE Sky. 


HER HUSBAND. 


tess 


0201... We ee 
10713 ( SHE IS SO CLOSE 
‘a TO HIM NOW. 


4 


» 


Wihe =~ FORWAR 
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